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Methods exhibiting hnear scahng with respect to the size of the system, so called 
0(N) methods, are an essential tool for the calculation of the electronic structure of large 
systems containing many atoms. They are based on algorithms which take advantage of 
the decay properties of the density matrix. In this article the physical decay properties 
of the density matrix will first be studied for both metals and insulators. Several strate- 
gies to construct 0(N) algorithms will then be presented and critically examined. Some 
issues which are relevant only for self-consistent 0(N) methods, such as the calculation 
of the Hartree potential and mixing issues, will also be discussed. Finally some typical 
applications of 0(N) methods are briefly described. 
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1 Introduction 



The exact quantum mechanical equations for many-electron systems are highly intricate. 
Any attempt to solve these equations analytically for real systems is doomed to fail. 
Numerical methods such as Configuration Interaction based methods (McWeeny 1989, 
Fulde 1995) or Quantum Monte Carlo methods (Hammond 1994, Nightingale 1998) can in 
principle solve these many-electron equations but because of the extremely high numerical 
effort required, their applicability is rather limited in practice. 

The bulk of all practical applications is therefore done within various independent- 
electron approximations such as the Hartrcc-Fock method (A. Szabo and N. Ostlund 
1982), Density Functional methods (R. Parr and W. Yang 1989) or Tight Binding meth- 
ods (C. Goringe et al., 1997a, Majewski and Vogl 1986). A comparison of the strength 
of different methods together with a selection of some interesting applications is given 
by Wimmer (1996). Even these approximate quantum mechanical equations are still 
fairly complicated and in general not solvable by analytical methods. Finding efficient 
algorithms to solve the many-electron problem numerically within any of these approx- 
imations is imperative for the applicability of quantum mechanics to physics as well as 
to chemistry and materials science. Due to efforts in the past satisfactory algorithms are 
now available and computational electronic structure methods are making very important 
contributions to our understanding of matter at the microscopic level. The 1998 Nobel 
prize for W. Kohn and J. Pople is a landmark sign of the importance of this approach. 
Since computational electronic structure methods are used over a very wide spectrum of 
applications is it hard to summarize their use. A hint of their versatility can be obtained 
by looking at the fraction of theoretical articles in several solid state and chemistry jour- 
nals where computational electronic structure methods are used. As can be seen from 
Table 1 this fraction varies between 11 and 59 % , being 27 % in the two largest journals 
in solid state physics (PRB) and chemical physics (JCP). 

Table 1: The importance of computational electronic structure methods as measured by 
the number of publications. Listed are the total number of publications, the number of 
theoretical publications and the number of publications using computational electronic 
structure methods in the period from January 1997 to September 1988. To determine 
whether a paper belongs into this last category, it has to contain either of the key words 
"electronic structure calculation", "ab initio calculation", "tight binding calculation" or 
"density functional calculation" . 1 thank Dr Wanitschek for providing me with these data. 
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Due to the constant increase in computer power and due to algorithmic improvements 
the importance of computational methods is growing further. Whereas computational 
methods nowadays mainly supplement experimentally obtained information, they are ex- 
pected to increasingly supersede this information. 

This article will concentrate on recently developed methods that allow us to calculate 
the total energy within various independent-electron methods for large systems. Practi- 
cally all physical observables can be obtained from the total energy, for instance in the 
form of derivatives with respect to certain external parameters. The reason why large 
systems containing many atoms are accessible with these algorithms is their linear scaling 
with respect to the number of atoms. In principle linear scaling should also be obtainable 
for true many-electron methods. For the MP2 method such an algorithms has indeed 
recently been reported (Ayala and Scuseria, 1998). 

Traditional electronic structure algorithms calculate eigenstates associated with dis- 
crete energy levels. The reason for this is probably historical since the prediction of these 
experimentally observed levels was the first big success of quantum mechanics. The dis- 
advantage of this approach is that it leads to a diagonalization problem which has a cubic 
scaling in the computational effort. Direct diagonalization (Press et ai, 1986), which was 
the standard approach in the early days of the computational electronic structure era, has 
a cubic scaling with respect to the size of the Hamiltonian matrix, i.e. with respect to 
the number of basis functions Mb. Iterative diagonalization schemes (Saad 1996), precon- 
ditioned conjugate gradient minimizations (Teter et ai, 1989, Stich et al, 1989, Payne, 
et al., 1992 ) and the Car-Parrinello method (Car and Parrinello 1985) for molecular 
dynamics simulations were a big algorithmic progress because of their improved scaling 
behavior. Their scaling was not any more proportional to the cube of the the number 
of basis functions but grew only like M;,log(Mf,) if plane waves were used as a basis set. 
Nevertheless these methods still have a cubic scaling with respect to the number of atoms 
Nat-, which comes from the orthogonality requirement of the wavefunctions. The reason 
why this orthogonalization step scales cubically can easily be seen. As the system grows, 
each wavefunction extends over a larger volume and has therefore to be represented by 
a larger basis set resulting in a longer vector. At the same time there are more such 
wavefunctions and each wavefunction has to be orthogonalized to all the others. Thus 
there are 3 factors that grow linearly, resulting in the postulated cubic behavior. The 
computer time Tcpu required to do the calculation is thus given by 

TcPu = c,Nl , (1) 

where C3 is a prefactor. It has to be pointed out that Equation (|ip gives only the asymp- 
totic scaling behavior. Within Density Functional and Hartree Fock calculations there 
are other terms with a lower scaling which dominate for system sizes of less than a few 
hundred atoms due to their large prefactor. In the case of plane wave type calculations 
the Fast Fourier transformations necessary for the application of the potential to the 
wavefunctions consume most of the computational time for small systems, in the case of 
calculations using Gaussian type orbitals (Hehre 1996) it is the calculation of the Hartree 
potential. This cubic scaling is a major bottleneck nowadays since in many problems of 
practical interest one has to do electronic structure calculations for systems containing 
many (a few hundred or more) atoms. Evidently, cubic scaling means that if one dou- 
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bles the number of atoms in the systems the required computer time will increase by a 
factor of eight. By enlarging the system one therefore rapidly reaches the hmits of the 
most powerful computers. So called 0(N) or low complexity algorithms are therefore a 
logical next step of algorithmic progress since they exhibit linear scaling with respect to 
the number of atoms 

Tcpu = CiKt . (2) 

These methods offer thus the potential to calculate very large systems. The prefactors Ci 
and C3 depend on the approximation used for the many-electron problem. For a Density 
Functional calculation with a large basis set the prefactors are of course much larger than 
for a Tight Binding calculation, where the number of degrees of freedom per atom is much 
smaller. The prefactor ci depends also on what 0(N) method is used, but in general 
the prefactor Ci is always larger than C3 assuming that the same independent-electron 
approximation is used both in the traditional and 0(N) version. There is therefore a so 
called cross over point. For system sizes smaller than the cross over point the traditional 
cubic scahng algorithms are faster, for larger systems the 0(N) methods win. Tight 
Binding calculations are an ideal test emvironment for 0(N) algorithms. Because of their 
rather small memory and CPU requirements one can easily treat systems comprising 
of a very large number of atoms and venture into regions beyond the cross over point. 
Contrary to what one might naively think, the importance of 0(N) algorithms will also 
increase as computers get faster. Whereas at present it is difficult to access the cross over 
region situated at some 100 atoms using the Density Functional framework, this will be 
easy with faster computers and 0(N) algorithms will be the algorithms of choice. 

Even though 0(N) algorithms contain many aspects of mathematics and computer 
science they have nevertheless deep roots in physics. Obtaining hnear scahng is not 
possible by purely mathematical tricks but it is based on the understanding of the concept 
of locality in quantum mechanics. Conversely, the need of constructing 0(N) algorithms 
was also an incentive to investigate locality questions more deeply, and has thus lead to 
a better understanding of this very fundamental concept. An algorithmic description of 
electronic structure in local terms can give a justification of the well established concepts 
of bonds and lone electron pairs in empirical chemistry. 

Since 0(N) algorithms are based on a certain subdivision of a big system into smaller 
subsystems, techniques developed in this context might also be helpful in reaching another 
important goal for treating large systems, namely combining electronic structure methods 
of different accuracy such as empirical Tight Binding and Density Functional theory in a 
single system. 

2 Locality in Quantum Mechanics 

Locality in Quantum Mechanics means that the properties of a certain observation region 
comprising one or a few atoms are only weakly influenced by factors that are spatially far 
away form this observation region. This fundamental characteristic of insulators is well 
estabhshed within independent-electron theories (Heine 1980) and it can even be carried 
over into the many-electron framework (Kohn 1964). 
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Traditional chemistry is based on local concepts. Covalently bonded materials are 
described in terms of bonds and lone electron pairs. It is standard textbook knowledge 
that the properties of a bond are mainly determined by its immediate neighborhood. The 
decisive factors are what type of atoms and how many of them (the coordination number) 
are surrounding it. Second nearest neighbors and other more distant atoms have a very 
small influence. As an example let us look at the total energy of a hydrocarbon chain 
molecule C„if2n+2- In this case each CH2 subunit is from an energetical point of view 
practically an independent unit. As one adds one CH2 subunit, the energy increases by 
an amount which is nearly independent of the chain length. Already the insertion of a 
CH2 subunit into the smallest chain C2HQ gives an energy gain which agrees within 10~^ 
a.u. with the asymptotic value of the insertion energy for very long chains. This means 
that the electrons belonging to this inserted subunit already do not see any more the 
end of the chain for very short chain lengths. This example is a drastic illustration of a 
principle sometimes termed "nearsightedness" (Kohn 1996). In other insulating materials 
the influence of the neighboring atoms decays slower. An example is shown in Figure (|ip, 
where the total energy per silicon atom is plotted as a function of the size of its crystalline 
environment. 
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Figure 1: The deviation of the total energy per silicon atom from its asymptotic hulk value 
as a function of the size of the periodic volume in which it is embedded. The calculation 
was done with a Tight Binding scheme using exact diagonalization 




Even in metallic systems, where the elementary bond concept is not any more valid, 
locality still exists. This is supported by the well known fact, that the total charge density 
in a metal is given with reasonable accuracy by the superposition of the atomic charge 
densities. Since atomic charge densities decay rapidly, this implies that the charge density 
at the midpoint of two neighboring atoms is mainly determined by the two closest atoms 
and very little by other more distant atoms. Another related example is given by V. 
Heine (Heine 1980) who points out, that the magnetic moment of an iron atom, which is 
embedded in an iron-aluminum alloy differs by less than 5 % from the value for pure iron 
if the atoms are locally surrounded by only eight aluminum atoms. 

This locality is not at all reflected in standard electronic structure calculations which 
are based on eigenorbitals extending over the whole system, making both the interpre- 
tation of the results more difficult and requiring unnecessary computational effort. The 
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simplistic bond concepts of empirical chemistry are certainly not adequate for electronic 
structure calculations aiming at high accuracy. Nevertheless one might hope to incor- 
porate some more general locality concepts into electronic structure calculation to make 
them both more intuitive and efficient. In the following we will therefore carefully examine 
the range of interactions in quantum mechanical systems. 

Self-consistent electronic structure methods require essentially two steps. The calcu- 
lation of the potential from the electronic charge distribution and the determination of 
the wavefunction for a given potential. In non-self-consistent calculations such as Tight 
Binding calculations, the first step is not needed. 

The calculation of the potential consists usually of two parts, the exchange correlation 
potential, and the Coulomb potential. The exchange correlation potential is a purely 
local expression in Density Functional Theory and can therefore be calculated with linear 
scaling. In the Hartree Fock scheme one might first think that the exchange part is non- 
local, but a more profound examination reveals (section |8.1| ) that it is local even in this 
case. The Coulomb potential on the other hand is very long range and needs proper 
treatment. A naive evaluation of the potential U arising from a charge distribution p by 
subdividing space into subvolumes AV and summing over these subvolumes, 

would result in a quadratic scaling since both indices i and j have too run over all grid 
points in the system. The Coulomb problem actually arises not only in the context 
of electronic structure calculations but also in classical calculations of coulombic and 
gravitational systems such as galaxies of stars. Much effort has therefore been invested 
in this computational problem and several algorithms are known which solve the problem 
with linear scaling. These methods will be described in section |0. 



The more interesting and more difficult part is to assess the role of locality for a given 
external potential. The appropriate quantity to study this property is the density matrix. 
The one-particle density matrix F completely specifies our quantum mechanical system 
within the independent electron approximation and all quantities of interest can easily 
be calculated from it. The central quantities in any electronic structure calculation, the 
kinetic energy Ekin, the potential energy Epot and the electronic charge density p are given 
by 

Ekin = -I [ V?F(r,r') _ dr' (3) 
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Epot = j F{v',v')U{v')dv' (4) 
p(r) = F(r,r), (5) 

where U{t') is the potential. A related quantity which will frequently be used throughout 
the article is the band structure energy Ebs defined as 

EbS = Ekin + Epot (6) 

and the grand potential 

n = EBs~ l^Nei , (7) 
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where n is the chemical potential and N^i the number of electrons. Subtracting jj^Nei 
from Ebs leaves Q invariant under a constant potential offset. If one applies the shift ( 
U{r) U{r) + const) the potential energy will increase by Ngi const. In order to conserve 
the total number of electrons, /i also has to be shifted (/i — > ^ + const) and thus Q remains 
constant. 

Discretizing the Hamiltonian H which is the sum of the kinetic and potential energy 
as well as F with respect to a finite orthogonal basis 0j(r), i = 1, one obtains 

H,, = I </.:(r) (-iVr^ + t/(r)) </.,.(r)dr (8) 
F,,, = j j cP*{v)F{v,v')<P,{v')dvdv' (9) 
and the expressions for the central quantities become 

Ebs = Tr[FH] (10) 
n = Tr[F{H-fiI)] (11) 

p(r) = 5:F,,0,(r)0,(r), (12) 

where Tr denotes the trace. It follows from Equation (|I2|) that the total number of 
electrons N^i in the system is given by 

Nei = Tr[F] . (13) 

Evaluating the traces using the eigenfunctions \I'„ of the Hamiltonian one obtains 
immediately the well known expressions for A^^e;, Ebs, ^ and p within the context of 
conventional calculations which are based on diagonalization. Denoting the eigenvalues 
associated with the eigenfunctions by e„ one obtains 

Nel = E/(e") (14) 

n 

Ebs = E/(e",)en (15) 

n 

^ = f i^n) -f^) en = J2fi^n) en -pN,i (16) 

n n 

P{r) = T.fi^n)Kir)^n{r). (17) 

n 

The function / is the the Fermi distribution 

1 . . 



fie) 



1 + exp( 



where kB is Boltzmann's constant and T the temperature. If we talk about temperature 
in this article, we always mean the electronic temperature since we are not considering the 
motion of the ionic degrees of freedom which might be associated with a different ionic 



temperature. In the expressions (|Tj)) (HID) (|T6D , and (^7\), as well as in the remainder of 



the whole article, we will use the convention, that all the subscripts indexing eigenvalues 
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and eigenf unctions are combined orbital and spin indices, i.e. that we can put at most one 
electron in each orbital. This will eliminate bothering factors of 2. The usually relevant 
case of an unpolarized spin restricted system can always easily be obtained by cutting 
into half all sums over these indices and multiplying by 2. 

In terms of the Hamiltonian H the density matrix is defined as the following matrix 
functional 

F = f{H). (19) 
Since F is a matrix function of H it has the same eigenfunctions \E'„ as H 

H^n = Cn^n (20) 

F^n = /(e„)^„. (21) 
The density matrix can consequently be written as 

F{r,r') = J2fi^n)Kir)^n{r'), (22) 

n 

where n runs over all the eigenstates of the Hamiltonian. From the functional form of the 
Fermi distribution it follows that the eigenvalues /(e„) are always in the interval [0:1]. At 
zero temperature the density matrix of an insulating system containing N^i electrons will 
have Nei eigenvalues of value one, all others being zero. Thus the density matrix does not 
have full rank, but only rank N^i. Hence we can write it as 

F(r,r')= E (23) 

n=occ 

where n runs now only over the N^i occupied states. It is easy to see that F{r, r') is a 
projection operator in this case 

J F(r, r")F{r", r')dr" = F(r, r') . (24) 

A new set of N^i eigenfunctions \I'"^"'(r) can be obtained by any unitary transformation 
of all the Nei degenerate eigenfunctions \E'n(i") associated with eigenvalues one, 

vl>r (r) = E Un,„.^Ur) , (25) 

m=occ 

where f/ is a unitary N^i by Nei matrix. In the case of a crystalline periodic solids such 
a transformation can be used to generate the localized Wannier functions (Blount) from 
the extended eigenfunctions We will refer to any set of orthogonal exponentially 
localized orbitals which can be used to represent the density matrix according to Equa- 
tion (|23| ) as Wannier functions. How to construct an optimally localized set of Wannier 
functions by the minimization of the total spread J2n < '"^ >« — < i" >^ in a crystalline 
periodic solid has recently been shown by Marzari and Vanderbilt (1997). It has been 
well known in the chemistry community (Chalvet 1976) that sets of maximally localized 
orbitals give excellent insight into the bonding properties of systems. In addition to the 
spread criterion used by Marzari et al. there are still other criteria in common use in the 
chemistry community. They are all in a certain sense arbitrary, but usually lead to the 
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Figure 2: A set of four Wannier orbitals (indicated by clouds) for the water molecule. 
The oxygen nucleus in the center is shown as a bright ball and the two hydrogen nuclei as 
nearly black balls. One sees two Wannier functions along the lines connecting the central 
oxygen with the two hydrogen atoms representing bonds as well as two lone electron pairs. 
The centers of the four Wannier functions form a nearly tetragonal structure. 

same interpretation of the bonding properties. Figure |^ shows the four Wannier functions 
for the water molecule. 

The density matrix F{r, r') is a diagonally dominant operator, whose off-diagonal 
elements decay with increasing distance from the diagonal. The exact decay behavior 
depends on the material. We will derive the decay properties within the theoretical 
framework of the description of periodic crystalline solids. For a periodic solid the density 
matrix is given by 



where \l/„^k(i") = Un,]ii^)e'^^^^^ are the Bloch functions associated with the wave vector k 
and band index n. The integral is taken over the Brillouin zone (BZ) and V is the volume 
of the real space primitive cell. 

The Wannier functions Wn of the n-th band in an insulating crystal are defined in the 




(26) 




rfk/(e„(k)Xk(r)«n,k(r')e*'^' 
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usual way 

Wn{v - R) = j^^ t/k e-^*^^ ^„,k(r) . (27) 

The Wannier functions are not uniquely defined. One can construct a different set of 
Bloch functions by multiplying them with a phase factor, \E'„ k(r) ^ e*'^''^^\E'„^k(r), where 
C(j(k) is an arbitrary function. This will obviously modify the Wannier functions. Further 
ambiguities arise in the case of degenerate bands (Blount). Because of these ambiguities 
in the construction of the Wannier functions it is advantageous to work with the density 
matrix where any phase factors cancel (Equation (p6D) and where degeneracies do not 
cause any problems since one sums over all the occupied bands. 

We will first discuss the decay properties of the density matrix in metallic systems. In 
this discussion we will assume that metals behave essentially like jellium and that exact 
results for jellium can be carried over to real metals. 

The decay properties of the density matrix of a metallic system at zero temperature 
are well known (March). Because the integral in Equation (p6D contains a discontinuity in 
the metallic case, the density matrix decays only algebraically with respect to the distance 
between r and r'. The decay is given by 

^/ ,^ , cos(A;;?|r — r'l) 

F{v,r')^kF '\ (28) 

where the Fermi wave vector kp is related to the valence electron density by ^ = ^ in 
a non-spin-polarized system. 

Introducing a finite electronic temperature T in a metal leads to a drastic change in 
this decay behavior. Instead of an algebraic decay one has a much faster exponential 
decay. As shown independently by Goedecker (1998a) and Ismail-Beigi and Arias (1998), 
the decay at low temperatures is then given by 

, cos(A;F|r - r'l) / keT \ 
^('^' ) 0^ ^Tp I - I ) ' (29) 

where c is a constant on the order of 1. We thus find oscillatory behavior with an 
exponentially damped amplitude. The decay rate depends linearly on temperature and 
the oscillatory part is described by the wave vector kp. The related correlation function 
at finite temperature exhibits the same temperature dependence of the decay rate with 
respect to temperature (Landau and Lifshitz, 1980). In an insulator finite temperature 
plays no role as long as the thermal energy ksT is much smaller than the gap, which is 
usually fulfilled. 

Let us next discuss the important case of an insulator with a band gap e^^p at zero 
temperature. We will first present some numerical results, then we will put forward some 
arguments to explain the qualitative features of the density matrix and finally discuss in 
a more quantitative way the factors which determine the exact decay rate. 

Numerical calculations of the density matrix or the related Wannier functions show an 
oscillatory behavior with a decaying amplitude. There is exactly one node per primitive 
cell and logarithmic plots of the amplitude clearly reveal an exponential decay. In the 
case of alkanes the decay of the density matrix calculated by the Hartree-Fock method has 
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been studied and plotted on a logarithmic scale by Maslem et al. (1997). Interestingly, 
the decay depends also on the basis set used. Small low quality basis sets lead to a larger 
band gap and consequently to a faster decay of the density matrix. In the case of silicon, 
treated by Density Functional theory, logarithmic plots revealing the exponential decay 
of the Wannier functions have also been done both for grid based basis sets (Goedecker 
unpublished) and atomic basis sets (Stephan 1998). Within the Tight Binding method 
the decay of the density matrix has also been studied numerically for crystalline and liquid 
carbon systems by Goedecker (1995) and for fuUerenes by Itoh et al. (1996). 

Let us now make plausible the exponential decay of the density matrix. The demon- 
stration is based on the fact, that one can express the Fourier components e„(R) of the 
band energy e„(k) through the Wannier functions Wn{r) 

e„(R) = ^J e„(k)e-^^^ = / W:{v')HW^{r' - R) dv' , (30) 

[2t[Y jbz V Jspace 

where R is a Bravais lattice vector. Now it is known, that the band energy en(k) is 
an analytic function (Blount). This is actually not surprising. The first and second 
derivatives of the band-structure have physical meaning since they are related to the 
electron velocity and effective mass. So it is to be expected that higher derivatives exist 
as well. Since the Fourier transform of an analytic function decays faster than algebraically 
(See Appendix) there exists a decay constant 7 and a normalization constant C such that 

Ce-^"" > e„(R) = ^f W:{r')HWn{r' - R) dr' (31) 

V Jspace 

It is reasonable to expect that HWn{r) will behave similarly as Wn{r). In particular 
we expect Wn{r) to be small whenever HWn{r) is small. So we will just drop H in 
Equation (0). In addition we will define this modified integral not only for lattice vectors 
R but for arbitrary vectors r to obtain. 

Ce-"''- > i / W:{r')Wn{r' - r) dr' (32) 

V J space 

If Equation (^2p holds, then one can use the mean value theorem to show that 
Ce-^' > i / W:{r')WrXr'-r)dv' 

V J space 

= ^E/ W:ir' - R')Wr.ir' - R' - r) dr' 
= ^iy;(s(r)-ROPV„(s(r)-R'-r)cir' 

R' 

= F(s(r),s(r)-r) (33) 

where the mean value s(r) is a vector within the primitive cell. Assuming that the density 
matrix has the same order of magnitude within each cell one can neglect the dependence 
of s on r to obtain the final result 

Ce"^^ > F(s, s - r) (34) 
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The numerically observed nodal structure of the density matrix can be motivated in 
a very similar way. Because of the orthogonality of the Wannier functions we have 



0= / W*{r')Wn{r' -R)dr' 

J space 



(35) 



for any non-zero lattice vector R. Doing the same sequence of transformation as in 
Equation (|33|) one obtains 

= F(s(R),s(R)-R) (36) 

So there has to be one node in each cell. The numerically calculated nodal structure for 
a 1-dimensional model insulator is shown in Figure H. 




Figure 3: The nodal structure of the density matrix for a 1-dimensional model insulator 
with a bandwidth of 4 o-u. and a hand gap of 2 a.u.. The length of the primitive cell is 1. 
The nodes predicted by Equation (WW are at the intersections with diagonal lines, two of 
which are shown by the dashed lines. 



The next step is to examine in a more quantitative way which factors determine the 
rate of this exponential decay for an insulator with a band gap egap at zero temperature. 

Cloizeaux (1964) proved the exponential decay behavior of the zero temperature den- 
sity matrix, which is a projection operator. Considering the extension of the band energy 
e„(k) into the complex k plane he found that the minimal distance of the branch points 
of e„(k) from the real axis determines the decay behavior. For the Wannier functions, 
which are closely related to the density matrix by Equation (|23|), Kohn (1959) proved the 
same decay behavior in the case of a one-dimensional model crystal. In a later publication 
Kohn (1993) claims that this distance to the real k axis should be related to the square 
root of the gap. Even though he did not present a derivation of this result, it was widely 
accepted to be generally valid. Ismail-Beigi and Arias (1998) have however shown that 
Kohn's claim is not generally valid. They demonstrated that in the Tight Binding limit 
the square root behavior can be found under certain circumstances, but that different 
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behaviors can be found as well. In the weak binding limit, where the band-structure can 
be obtained by perturbation theory from the band structure of the free electron gas, they 
showed that the dependence is actually linear. 

F(r, r') oc exp(— 7|r — r'l) where^ = cegap a (37) 

The lattice constant is denoted by a, and c is an unknown constant of the order of 1. 

The dependence of the decay rate on the size of the band gap is a rather surprising 
relation. After all it follows from Equation (|2^) that only the properties of the occupied 
bands enter into the calculation of the density matrix, whereas the size of the gap is 
not directly related to the occupied states. In the following we will give an intuitive 
explanation of the factors determining the decay rate. This explanation will again be 
based on Equation ( pO|) relating the bandstructure to the decay properties of the density 
matrix. As is known from complex analysis, the distance of the singularities from the real 
axis is comparable to the length over which one has very strong variations along the real 
axis of a complex function. Now, the long range decay properties of a Fourier transform 
are exactly determined by the length AA; of such a region of strongest variation (See 
Appendix). One thus regains Cloizeaux's result that the decay rate is proportional to the 
distance of singularities from the real axis. Let us now explain the behavior found in the 
weak binding limit by Ismail-Beigi and Arias. In the weak binding limit the effective mass 
establishes the connection between the gap and the important features of the occupied 
bands. The effective mass for the n-th band at the point ko is defined as (Kittel 1963) 

1 |/^;koWV^m,ko(r)rfrp 
^ ^ m^n e„(ko) - e™(ko) 

Since we are only interested in order of magnitudes, we have here averaged over the 
diagonal elements of the effective mass tensor in order to obtain a effective mass which 
is a scalar quantity. In the case of the weak binding limit, a gap will open up at the 
boundaries of the BriouUin zone and this gap will be small. The effective mass is therefore 
small and proportional to a^egap, where we have assumed that the dipole matrix elements 
/ ^n,ko('')^^«.ko(i^)'^i" on the order of ^. The band-structure near the boundaries of 
the BriouUin zone is then given by 

where Ak is the distance from the boundary, neglecting directional effects. Since the 
effective mass is small, the curvature of the band-structure is large in this region. Hence 
this region is just the region with the strongest variation. As is well known (Ashcroft 
and Mermin 1976), the perturbation theory arguments leading to Equation ( p9| ) are valid 
within an energy range of the order of egap- It then follows from Equation (|39|) that the 
corresponding range of Ak is egap a, confirming the linear decay of the density matrix with 
respect to the size of the gap, i.e. 7 = c egap a. 

Let us next show how a square root like behavior 7 = Cy^e^ can arise for real crystals 
with a big gap. In this case the effective mass is of the order of one at all stationary 
points ko in the BriouUin zone. Assuming that it is then of the order of one over the 
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whole Brioullin zone, the region of largest variation is just the Brillouin zone itself. The 
decay constant is therefore simply related to the lattice constant a. 

7 = c- (40) 
a 

In order to get the square root dependence of the decay constant 7, one has to assume 
that ^ 

Ggap = Cgap-^ (41) 

where Cgap is a constant which is not or only weakly dependent on the material. Such a 
behavior has indeed been observed for certain classes of materials, where the tight binding 
limit is the most appropriate one, such as ionic crystals (Harrison 1980), but with a non- 
negligible variation of Cgap across different materials. A square root behavior of 7 can 
therefore be expected if one varies the lattice constant for a certain material, but the decay 
constants for different materials that happen to have the same gap are not necessarily 
comparable. 

In practice the distinction between the Tight Binding and weak binding case may not 
always be clear. Unless the region of strongest variation is really a very small fraction of 
the whole Brioullin zone, all the prefactors which were neglected in these considerations 
might be important enough to blur out differences. The importance of these prefactors 
can also be seen from the fairly strong directional dependence of the decay rate. Ismail- 
Beigi and Arias (1998) found such a strong directional dependence in numerical tests to 
confirm the linear dependence of the decay constant on the size of the gap (Figure 
Stephan et al. (1998) found the same behavior during Tight Binding studies of carbon. So 
a statement in an old paper by Kohn (1964), namely that the decay length of the Wannier 
functions is of the order of the interatomic spacing, is for practical purposes probably in 
many cases the best available characterization of localization. 

As a numerical illustration of this surprising result, that only a small part of the 
Brillouin zone where one has the strongest variation determines the decay behavior of the 
Wannier functions, we compared the decay behavior of carbon in the diamond structure 
with " syntheticum" in the same structure. The artificial element "syntheticum" was 
computer generated within the Tight Binding context in such a way that its top part 
of the conduction band as well as the gap is nearly identical to real carbon, whereas the 
lower part of the valence band is drastically different as shown in Figure ^. More precisely. 
Carbon was characterized by the parameters of Goodwin (1991) and to obtain syntheticum 
was modified from -5.16331 to -1.16331 and Vssa was modified from -4.43338 to -2.43338. 

Figure |^ shows the decay behavior of the density matrix. As one can see the decay 
behavior is very similar in both cases. We note that not only the gap is similar but also 
the effective mass since the density of states at the top of the valence band has the same 
behavior in both materials. 

All the above arguments apply to simple and mainly periodic materials. Advanced 
electronic structure calculations however frequently study materials which are not in this 
class. The localization properties of such materials have not yet been studied system- 
atically and so there is some incertitude about which orbitals are localized and to what 
extent (Kohn 1995). If the localization properties are unknown one should better not im- 
pose any localization constraints. In this case some of the discussed 0(N) techniques still 
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Figure 4: The dependence of the decay constant 7 on the gap. Plotted are the two 
dimensionless quantities 07 versus a^egap. The variation of the gap was obtained for a 
three-dimensional cubic model crystal by varying the strength of the potential. Circles refer 
to the [100], stars to the [110] and pluses to the [111] direction. This figure is reproduced 
with kind permission of the authors from Ismail-Beigi and Arias (1998) 



give a quadratic scaling, which also allows us to gain computational efficiency compared 
to the traditional cubically scaling algorithms. 

3 Basic strategies for 0(N) scaling 

Most 0(N) algorithms are built around the density matrix or its representation in terms 
of Wannier functions and take advantage of its decay properties. To obtain linear scaling 
one has to cut off the exponentially decaying quantities when they are small enough. This 
introduces the concept of a localization region. Only inside this localization region the 
quantity is calculated, outside it is assumed to vanish. For simpUcity the localization 
region is usually taken to be a sphere, even though the optimal shape might be different 
(Stephan 1998). In the Tight Binding context the boundary of the localization region can 
either be defined by a geometric distance criterion or in terms of the number of "hops", 
i.e. the number of steps one has to do along bonds connecting neighboring atoms to reach 
this boundary (Voter et al., 1996). Different localization regions generally have significant 
overlaps. The localization regions thus do not form a partition of the computational 
volume and one atom in general belongs to several localization regions. 

In a numerical calculation the density operator F(r, r') is discretized with respect to 
a basis. The basis set has to be chosen such that the matrix elements Fi^j reflect the 
decay properties of the operator F(r, r'). This will obviously only be the case if the basis 
set consists of localized functions, such as atom centered Gaussian type basis functions. 
Sets of orthonormal basis functions usually facilitate the calculations. Unfortunately all 
currently used localized basis sets are non-orthogonal. In the context of the orthogonal 
Tight Binding scheme (Goringe et al., 1997a, Majewski and Vogl 1986) one just assumes 
the existence of an basis set which is both atom centered and orthogonal. Since only 
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Figure 5: Comparison of the density of states of carbon and syntheticum. As one sees 
both have roughly the same gap. The lower part of the valence band is however drastically 
different. The valance band of syntheticum is roughly only half as wide as the one of 
carbon 
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Figure 6: Comparison of the decay behavior of the density matrices for carbon and 
syntheticum. They are both very similar. The moderate scattering comes from the fact 
that the density matrix does not decay equally fast in all directions 
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the parameterized Hamiltonian matrix elements enter in the calculation, there is no need 
to explicitly ever construct such a basis set. In the following sections, we will follow 
this practice and assume in all relevant parts that we are dealing with such a localized 
orthogonal basis set. The non-orthogonal case will be discussed in section |^. Whenever 
we refer from now on to a localization region, we actually mean the subset of all basis 
functions which are contained within this spatial localization region. 

Obviously the size of the localization region needed to obtain a certain accuracy de- 
pends on the decay properties of the density matrix as well as on the selected accuracy 
threshold. It also depends on the quantity one wants to study. Generally, the total energy 
as well as derived quantities such as the geometric equilibrium configurations are surpris- 
ingly insensitive to finite localization regions, because these quantities are not strongly 
influenced by the exponentially small tails which are cut off by the introduction of a lo- 
calization region. This insensitivity also holds true, even though to a much lesser extent, 
for metals. As we have seen above the introduction of a finite temperature leads to an 
exponential decay of the density matrix which in turn justifies truncation. In a metal, 
the difference between the finite and the zero temperature total energy /S.E is propor- 
tional to the square of the temperature, AE oc T^, (Ashcroft and Mermin 1976) and 
thus rather small. There are however quantities which are very sensitive to finite localiza- 
tion regions. In the modern theory of polarization in solids (King-Smith and Vanderbilt 
1989), the polarization can be expressed in terms of the centers of the Wannier functions 
/ W {v)vW {Y)dY . Using this formula (Fernandez et ai, 1997) one has a strong influence of 
the tails of the Wannier functions because they get strongly weighted by the factor of r in 
the integral. Since the tails are much more influenced by the boundary of the localization 
region than the central part, this quantity is more sensitive to the size of the localization 
region. 

There are even quantities which are not at all directly accessible by a solution which 
is given in terms of density matrices or Wannier functions. The Fermi surface in a metal 
which can be calculated via the eigenvalues of the band structure e„(k) is such an example. 

It is also clear that one can gain significant computational efficiency only if the size of 
the system is larger than the size of the localization region. When this criterion is fulfilled 
depends not only on the decay properties of the density matrix of the system but also on 
its dimensionality. In the case of a linear chain molecule with a large band gap, it might be 
enough to have a localization region containing just two neighboring atoms on each side. 
So the localization region would just contain 5 atoms and for systems larger than 5 atoms 
one might potentially gain computational efficiency by using an 0(N) method. If one 
has a 3 dimensional system with a comparable gap, then a spherical localization region 
extending out to the second neighbors would contain some 60 atoms and the crossover 
point would already be much larger. For a system with a small gap such as silicon or for 
metallic systems the crossover point is even larger. 

There are essentially six basic approaches to achieve linear scaling. 



The Fermi Operator Expansion (FOE) is based on Equation (|T^). In this approach 
one finds a computable functional form of F as a function of H to build up the 
density matrix. Two possible representations based on a Chebychev expansion and 
a rational expansion will be discussed. 
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• The Fermi Operator Projection (FOP) is closely related to the FOE method. The 
computable form of F is however not used to construct the entire density matrix 
but to find the space spanned by the occupied states, i.e. the space corresponding 
to the eigenfunctions associated with the unit eigenvalues of the Density matrix at 
zero temperature. These eigenfunctions can be considered as Wannier functions in 
the generalized sense defined before. 

• In the Divide and Conquer (DC) method for the density matrix the relevant parts of 
the density matrix are patched together from pieces that were calculated for smaller 
subsystems. 

• In the Density Matrix Minimization (DMM) approach, one finds the density matrix 
by a minimization of an energy expression based on the density matrix. 

• In the Orbital Minimization approach (OM), one finds a set of Wannier functions 
by minimization of an energy expression. 

• The Optimal Basis Density Matrix Minimization scheme (OBDMM) contains as- 
pects of both the OM and DMM methods. In addition to finding a density matrix 
with respect to the basis, one also finds an optimal basis by additional minimization 
steps. The number of basis functions has to be at least equal to the number of 
electrons in the system, but can be bigger as well. 

A major difference between these methods is whether they calculate the full density 
matrix or only its representation in terms of Wannier functions. The later approach 
applies only to insulators while the former is in also applicable to systems with fractional 
occupation numbers (i.e. /(e„) is not either 1 or 0) such as metals or systems at finite 
electronic temperature. 

In the following each of these six approaches will be presented in detail. 

3.1 The Fermi Operator Expansion 

The FOE (Goedecker and Colombo 1994, Goedecker and Teter 1995) is the most straight- 
forward approach for the calculation of the density matrix. The basic idea in this approach 
is to find a representation of the matrix function (|19]) which can be evaluated on a com- 
puter. Several such representations are possible. We will discuss a Chebychev and a 
rational representation. 

3.1.1 The Chebychev Fermi Operator Expansion 

One of the most basic operations a computer can do are matrix times vector multiplica- 
tions. The simplest representation of the density matrix would therefore be a polynomial 
representation 

F ^ p{H) = Col + CiH + c^H^ + ... + c„^,i7"-' . 

where / is the identity matrix. Unfortunately polynomials of high degree become nu- 
merically unstable. This instability can however be avoided by introducing a Chebychev 
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polynomial representation, which is a widely used numerical method (Press et al., 1986) 



p{H) = ^I + Y.c,T,{H). (42) 

Since the Chebychev polynomials are defined only within the interval [-1:1], we will as- 
sume in the following that the eigenvalue spectrum of H falls within this interval. This 
can always be easily achieved by scaling and shifting of the original Hamiltonian. The 
Chebyshev matrix polynomials Tj{H) satisfy the recursion relations 

To{H) = I (43) 
Ti{H) = H (44) 
T,+i(/7) = 2/7T,(iJ)-T,_i(/J). (45) 

The expansion coefficients of the Chebychev expansion can easily be determined. The 
eigenfunction representation (Equation (|2ll) ) of F is, 

< >= /(e„) 5n,m ■ (46) 

Evaluating the polynomial expansion in the same eigenfunction representation we obtain 

< ^„|p(iy)|^„ >= p{en) 6n,m , (47) 

where 

"pi 

P{^) = i + T.^3U^)- (48) 

Comparing Equation (|i6| ) and Equation (^) we see that the polynomial p(e) has to 
approximate the Fermi distribution in the energy interval [-1:1] where the scaled and 
shifted Hamiltonian has its eigenvalues. How to find the Chebychev expansion coefficients 
for a scalar function is described in standard textbooks on numerical analysis (Press et 
al., 1986). Actually it is not necessary to take the exact Fermi distribution. In practically 
all situations one is interested in the limit of zero temperature. Hence any function 
which approaches a step functions in the limit of zero temperature can be used. In 
the case of simulations of insulators for instance it is advantageous to take the function 
/(e) = |(1 — erf(^^)) shown in Figure |^ since it decays faster to respectively 1 away 
from the chemical potential. The term Fermi distribution will in the following always be 
used in this broader sense. The energy resolution Ae is chosen to be a certain fraction 
of the size of the gap (Goedecker and Teter 1995). In the case of metals, Ae is chosen 
by considerations of numerical convenience. Large values of Ae will give lower accuracy 
results. However as pointed out before, the convergence of the total energy with respect 
to Ae is quadratic and so highly accurate total energies can be obtained with rather 
high values of Ae (Goedecker and Teter 1995). Small values of Ae make the calculation 
numerically expensive. The detailed scaling behavior of the numerical effort in the limit 
of vanishing gaps is analyzed in section |, where it is found that actually the increase in 
the size of the localization region is the limiting factor in all methods. 



20 




-25 -20 -15 -10 -5 5 10 15 



ENERGY in eV 

Figure 7: The Fermi distribution as obtained by a Chebychev fit of degree 40 in the case 
of a diamond structure. The bandgap is in between the two vertical lines. 



Even if one wants to study electronic properties in the limit of zero electronic temper- 
ature it is important that one nevertheless uses a finite temperature Fermi distribution for 
the Chebychev fit. Using the zero temperature step function introduces so-called Gibbs 
oscillations in the fit and spoils the Chebychev fit over the whole interval. 

How to eliminate these Gibbs oscillations in the zero temperature case by the so called 
kernel polynomial method (Voter et al, 1996, Silver et a/., 1996) can be used as a starting 
point for an alternative derivation of the FOE method. The basic idea is to expand 
a delta function as a polynomial using damping factors to suppress large oscillations. 
This representation of an approximate delta function can then be integrated to obtain 
a smooth representation of the Fermi distribution. Used this way the kernel polynomial 
method is thus just another way to derive the expansion coefficients for the Chebychev 
expansion (Kress et al, 1998). In addition the kernel polynomial method can also be used 
to smear out the density of states rather than the zero temperature Fermi distribution 
resulting in a method with practically identical computational requirements but some 
slightly different properties. One useful property is that the smeared density of states 
energy is an approximate lower bound to the energy, whereas the smeared Fermi energy 
is an approximate upper bound (Voter et al., 1996). 

Coming back to the original motivation for a polynomial representation, let us now 
show how the density matrix can be constructed using only matrix times vector multi- 
plications. Let us denote by tj the 1-th column of the Chebychev matrix Tj. Now each 
column of these Chebychev matrices satisfies the same recursion relations 

\ei > (49) 
H\ei > 

2H\t\ > -Itf^ > . 

where e/ is a unit vector that has zeroes everywhere except at the l-th entry. So Equa- 
tion ( ^91) demonstrates that we indeed need only matrix vector multiplications. Once we 
have generated the l-th columns of all the Chebychev matrices, we can obtain the l-th 



\t^> 
\t}> 



> 
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column // of the density matrix just by forming linear combinations 



ripi 

l/^>=?|t?>+E9l^i > • (50) 
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As we have described the method so far it has a quadratic scaling instead of the linear 
scaling we finally want to achieve. If we have Mf, basis functions, the density matrix is a 
Mf, X Mfe matrix and we have to calculate full columns. For the calculation of each 
column, we have to do Upi matrix times vector multiplications, each of which costs Mf,nH 
operations assuming the matrix H is a sparse matrix with uh off-diagonal elements per 
row/column. So the total computational cost is Upi hh- The degree of the polynomial 
Upi and the width uh of the Hamiltonian are independent of the size of the system, whereas 
Mb is proportional to the number of atoms in the system. The overall scaling with respect 
to the number of atoms is therefore quadratic. 

In order to do the correct shifting and scaling of the original Hamilton to map its eigen- 
value spectrum on the interval [-1:1] we have to know its lowest and highest eigenvalues 
emin and e^ax- In addition we have to know the chemical potential fi. There are auxiliary 
matrix functions of H that can help us to determine these quantities. These functions of 
H can be build up in the same way as the density matrix. Since the recursive build up of 
the Chebychev matrices is the most costly part, the additional cost for evaluating other 
functions is negligible. To determine whether we have a vanishing density of states beyond 
an energy e^p we can for instance construct a Chebychev fit Pup(e) to a function which is 
zero (to within a certain tolerance) for energies below eup, but blows up for energies larger 
than eup- If Tr\pup{H)] does not vanish we have a non- vanishing density of states beyond 
eup. A similar procedure can be applied to determine a lower bound for the density of 
states. The determination of the chemical potential in an insulator can be done along the 
same lines as well (Bates and Scuseria 1998). Without any significant extra cost one can 
build up several Fermi distributions with different chemical potentials until one finds the 
correct chemical potential leading to charge neutrality. In a metallic system the search for 
the chemical potential can be accelerated since it is possible to predict with high accuracy 
how the number of electrons changes in response to a change in the chemical potential. 
From Equation (|13|) it follows 



dNel_ 



-Tr[p'{H)] , (51) 



where p' is the derivative of the Chebychev polynomial p that approximates the Fermi 
distribution. The Chebychev expansion coefficients of p' can be calculated from the coeffi- 
cients for p (Press et al., 1986). Using the finite difference approximation of Equation (|5lD, 



it is possible to find the correction A/i to the chemical potential which will nearly exactly 
eliminate an excess of ANei electrons due to an incorrect initial chemical potential. The 
correct chemical potential in a metallic system can thus be found with very high accuracy 
with a few iterations. 
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The desired linear scaling can be obtained by introducing a localization region for 
each column, outside of which the elements are negligibly small. For the k-th column, 
this localization region will be centered on the k-th basis function. If we use atom centered 
basis functions, then the localization region will consequently be centered on the atom 
to which this k-th basis function belongs. We have then to calculate only that part of 
each column which corresponds to this localization region. This means that we can use 
a truncated Hamiltonian H{k) which retains only the matrix elements corresponding to 
the basis functions contained within the localization region k. Denoting the number of 
basis functions in this region by Mioc (which might actually depend on the localization 
region k being considered), the overall computational cost is then MbMiocUpinH and thus 
scales linearly. Let us stress, that the size of the localization region is independent of 
the degree of the polynomial. If one uses for instance a polynomial of degree Upi = 50, 
the recursion in Equation (^) will extend over the 50 nearest neighbor shells without 
localization constraint for a Hamiltonian coupling only nearest neighbors. The localization 
region however is typically much smaller comprising just a few nearest neighbor shells. 
Imposing a localization region introduces some subtleties. For instance the eigenvalues of 
the truncated density matrix are not anymore exactly given by p(e„) and F is not any 
more strictly symmetric. More importantly, strictly speaking we can no longer use the 
Trace notation, since we use different local Hamiltonians H[k) to build up the different 
columns of the density matrix. The band-structure energy Ebs has now to be written as 



(53) 



Another important quantity are the forces. The force acting on the a-th atom at 
position Ra is obtained by differentiating the total energy with respect to these positions. 
The total energy consists of the band structure part and possibly other contributions. We 
will only discuss the non-trivial part of the force arising from the differentiation of the 
band structure energy Ebs- For simplicity let us assume that we have a simple polynomial 
expansion and not a Chebychev expansion. Let us also assume that we calculate the full 
density matrix, i.e. that we do not truncate H by introducing a localization region. We 
then obtain 
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Let us consider for instance the term for which z/ = 2 
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(54) 



(55) 



where we used that Tr[74i?] = Tr[BA]. The final result for the force, which also holds in 
the case of a Chebychev expansion, is thus 



dEBs 
dRn 



Tr 



{p{H) + Hp'{H)) 



dH 
MI 



(56) 



In the case of an insulator, the second term in the brackets Hp'{H) is very small compared 
to the first term p{H) at small but finite temperatures and it vanishes in the limit of zero 
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temperature. The reason for this is that the eigenvalues of the matrix p'{H) are p'{en)- 
Since at zero temperature p'{e) is nonzero only at the chemical potential which is in the 
middle of the gap, all eigenvalues are zero and the matrix is identically zero. Nevertheless 
it is recommendable to retain this term in numerical calculations because it leads to forces 
consistent with the total energy. 

In the case where we calculate only part of the density matrix, i.e. where we have 
a truncated Hamiltonian H{k) going with the energy expression (|53D we cannot use the 
properties of the trace to simplify the force expression as we did in Equation (0). The 
equation corresponding to Equation (^) therefore reads 

E [HmUH{k)h,j2(^§^] + (57) 

k,jl,j2,k \ ^^<^ I j2,k 

Similar results hold for all the other terms with different values of u. In the case of a 
Chebychev expansion the situation is completely analogous, just the formulas are more 
complicated. The force formula has been worked out in this case by Voter et al. (1996) 
and is given by 

dj^ = +£(1 + *.)(! + k,_.-mH)§-T,-UH) . (58) 

where kj = if j < and kj = 1 otherwise. In the typical Tight Binding context 
is a very sparse matrix. If it contains no non-zero elements, we need of the order of 
ripi no Mb operations to evaluate all the forces according to Equation (0). The error 
incurred by using the approximate formula (^) based on the trace is usually negligible if 
the localization region is large enough. Since the approximate formula can be evaluated 
with order npi noM^ operations, it might actually be preferable to do so. In a molecular 
dynamics simulation, the largest deviations in the conservation of the total energy come 
from events where atoms enter or leave localization regions and this kind of error is not 
taken into account by either force formula. 

All the above force formulas were derived for the case where we have a constant 
chemical potential and where the polynomial representing the Fermi distribution does 
thus not change. Frequently one wants however to do simulations for a fixed number of 
electrons rather than for a fixed chemical potential. In this case one has to readjust the 
chemical potential for each new atomic configuration. The chemical potential is thus a 
function of all the atomic positions /z = fi{Ra), but the explicit functional form of this 
dependence is not known. The force formula can however also be adapted to this case (B. 
Roberts and P. Clancy 1998). Ignoring the above warnings and using again trace notation 
for simplicity we have 

Ebs = Tr[H p{H ~ fil)] (59) 
N,i = Tr\p{H~^iI)] (60) 
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and consequently 
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(61) 
(62) 



Since has to be equal to zero, we can solve Equation (|6^) for and insert it into 
equation (O) to obtain the force under the constraint of a constant number of electrons. 

Let us finally derive a force formula for the case where a local charge neutrality condi- 
tion is enforced (Kress et al., 1998). The motivation for this approach is that in non-self- 
consistent Tight Binding calculations one frequently finds an unphysically large transfer of 
charge between atoms. In a self-consistent calculation the electrostatic potential, built up 
by a charge transfer, is counteracting a further charge flow and thus limits charge transfer 
to reasonably small values. Some Tight Binding schemes (Horsfield et al., 1996a) enforce 
a so-called local charge neutrality condition requiring that the total charge associated 
with an atom in a molecule or solid be equal to the charge of the isolated atom. This is 
done by determining a potential offset Ua for each atom a in the system which will ensure 
this neutrality. The total Hamiltonian H of the system is then given hy Hq + U where 
Hq is the Hamiltonian without any potential bias and U a diagonal matrix containing the 
atomic potential offsets Ua- The band structure energy is given by 



Ebs = Tr[{Ho + U) p{H, + f/)] - ^ 



(63) 



where the term containing the atomic valence charges Qa has been subtracted to make 
the expression invariant under the application of a uniform potential bias to all atoms in 
the system. Expressed in terms of the density matrix the local charge neutrality condition 
becomes 

I 

In Equation (0) we have labeled the basis functions by a composite index where a 
indicates on which atom the basis function is centered and where / describes the character 
of the atom centered basis function. If we have carbon atoms, for which Qa = 4, / would 
for instance denote the 4 orbitals 2s, 2px, 2py, 2pz. Using Equation (|5), Equation (|63| ) 
then simplifies to 



EBs = Tr[Hop{Ho + U)]. 



Taking the derivative we get 
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As discussed above, the matrix p'{H) is close to zero in an insulator at sufficiently low 
temperature and can often be neglected. The forces are therefore approximately given by 



dE 



dRr, 



BS 



Tr 



Hop{H) 



OH 
MI 



(68) 



It has to be pointed out that to get sufficiently high accuracy the degree of the polynomial 
has to be higher than in the case of Tight Binding case without local charge neutrality. 

The degree npi of the polynomial needed to represent the Fermi distribution is pro- 
portional to 

(69) 



Ae 

This follows from the fact, that the nth order Chebychev polynomial has n roots and so a 
resolution that is roughly proportional to 1/n. For the usual Tight Binding Hamiltonians 
the ratio in Equation ([69| ) is not very large and for silicon and carbon systems without 
gap states polynomials of degree 50 are sufficient. In contexts other than Tight Binding 
this ratio can however be fairly large and polynomial representation would become very 
inefficient. 



3.1.2 The Rational Fermi Operator Expansion 

A rational representation of the density matrix (Goedecker 1995) is in this case more 
efficient 

As is well known, the function /(e) given by 

ZTTz J e — z 

is equal to 1 if e is within the volume encircled by the contour integration path and zero 
otherwise. If the integration path contains the occupied states as shown in Figure (||) it 
can therefore be used as a zero temperature Fermi distribution. The exact finite temper- 
ature Fermi distribution /(e„(k)) can be obtained by replacing the path in this contour 
integral by a sequence of paths which do not intersect the real axis (Goedecker 1993, 
Gagel 1998, Nicholson and Zhang 1997). Actually, as already mentioned above, it is usu- 
ally not necessary to have the exact Fermi distribution. The electronic temperature is just 
determined by the slope (and possibly some higher derivatives) of the distribution at the 
Fermi energy. We will also refer to such generalized distributions as Fermi distributions. 
A distribution of this type can be obtained by discretizing the zero temperature contour 
integral from Equation (0) as shown in Figure ^. 

In principle any other set of ZyS and w^s can be used as long as it satisfies 

/(e) - E ^ > (72) 



26 



occupied " unoccupied 



^ states ^ states energy 

integration path 

Figure 8: A discretization of the contour integral in the complex energy plane of Equa- 



tion (71). The resulting Fermi distribution is shown on top. 



where Upd is the degree of the rational approximation. How can we now evaluate Equa- 
tion ( ffOD on a computer? Denoting by F^, we have 

{H-z,)F, = I (73) 
F = Y.^,F,- (74) 

So we have first to invert all the matrices H — and then to form linear combinations 
of them. The inversion is equivalent to the solution of Mi, linear systems of equations. 
This can be effectuated using iterative techniques so that in the end everything can again 
be done by matrix times vector multiplications. A rational approximation can represent 
the sharp variation near the chemical potential of a low temperature Fermi distribution 
in a more efficient way than a Chebychev approximation. Whereas in the Chebychev 
case the degree of the polynomial is given by Equation ( pUD the degree of the rational 
approximation ripd is given by 

i^pd = — — • (75) 

This Upii in contrast to Upi does not depend on the largest eigenvalue e^ax- Once 
is of the order of magnitude given by Equation ( fTSD one has exponential convergence to 
the zero temperature Fermi distribution. In the case where the integration points and 
weights are obtained by discretizing the contour integral of Figure ^ this exponential 
behavior is immediately comprehensible since an equally spaced integration scheme gives 
exponential convergence for periodic functions (Sloan and Joe, 1994). Since Upd is usually 
reasonably small, the success of the method will hinge upon whether it is possible to solve 
the linear system of equations associated with each integration point with a small number 
of iterations. The number of iterations in an iterative method such as a conjugate gradient 
scheme (Press et ai, 1986) is related to whether it is possible to find a good preconditioning 
scheme. In the case of plane wave calculations a good preconditioner can be obtained from 
the diagonal elements and of the order of 10 iterations are required. In other schemes 
using Gaussians for instance it is not quite clear whether good preconditioners can be 
found. When the Hamiltonian depends on the atomic positions Ra, Equations (|73|) and 
(fr^) can be differentiated to obtain the derivative -j^, which is needed for the calculation 
of the forces. 
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3.2 The Fermi Operator Projection Method 



The FOE method is used to calculate the full density matrix. This can be inefficient if the 
number of basis functions per atom is very large. As was mentioned before, the density 
matrix at zero temperature does not have full rank. In the case of an insulator it can 
be constructed from N^i Wannier functions (|23|). If one has a numerical representation 
of the zero temperature density operator, which is actually a projection operator, that 
eliminates all components belonging to eigenvalues above the Fermi level, one can apply it 
to a set of trial Wannier functions V^, n = 1, N^i to generate a set of orbitals which span 
the space of the Wannier functions. The numerical representation of the density operator 
can again either be a Chebychev or rational one. We will ffist discuss the rational case 
(Goedecker 1995). 

To do the projection with a rational representation, a system of equations analogous to 
([7^) and ( [7iD has to be solved for each trial Wannier function Vn and at each integration 
point v 

{H-Z,)Wn,u = Vn (76) 

Wn = Y.^.Wn,,. (77) 

Thus the saving comes from the fact that one has to solve this system of equations ([7B| ) just 
for right hand sides, whereas one has M\, right hand sides in Equation ([731). Obviously 
the solution of the Equation ([76| ) has to be done not within the whole computational 
volume but only within the localization region to obtain linear scaling. The functions 
Wn will now span our subspace unless one of our trial functions Vn was chosen in such 
a way that it has zero overlap with the space of the occupied orbitals, which is highly 
unlikely. To obtain a set of valid Wannier functions Wn one has still to orthogonalize the 
orbitals Wn- Since the Wn^s, are localized the overlap matrix is a sparse matrix and can 
be calculated with linear scaling. In the typical Density Functional context, the inversion 
of this matrix is a rather small part, even if it is done with cubic scaling. In a Tight 
Binding context it is much more important and a linear scaling method has been devised 
by Stephan et al. (1998) for the inversion. The construction of the Wannier functions by 
projection according to Equations (|76[) and (^) is illustrated in Figure ^ in the case of 
a silicon crystal. In this case one knows that the Wannier functions are bond centered 
and it is therefore natural to choose a set of bond centered functions as an initial guess. 
In this example we took simple Gaussians. As shown in Figure ^ the projection modifies 
the details of the Gaussian but does not significantly change its localization properties. 

Chebychev based projection methods have been introduced in connection with other 
techniques by Sankey et al. (1994) and by Stephan et al. (1998). 

3.3 The Divide and Conquer method 

The original formulation of the DC method (W. Yang 1991a, W. Yang 1991b, Zhao and 
Yang 1995) was based on a subdivision of the electronic density. To calculate the density 
at a certain point an ordinary electronic structure calculation is done for a sub-volume 
containing this point. Since the electronic density is only influenced by features in a 
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Figure 9: The effect of applying the density operator, which is a projection operator 
in the eigenvalue space, to a Gaussian (dashed line) centered in the middle of a bond 
between two silicon atoms denoted by discs. The resulting function W is shown by the 
solid line. The orthogonal Wannier function W obtained by symmetric orthogonalization 
is practically indistinguishable from W on this scale. The calculation was done using 
Density Functional Theory with pseudopotentials 



rather small neighborhood, the density obtained in this way is a good approximation to 
the density one would obtain by doing a calculation in the whole volume occupied by 
the molecule or solid under consideration. The more recent formulation of this theory 
(W. Yang and T-S. Lee 1995) is however also based on the density matrix and we will 
discuss this version in more detail. The idea is to calculate certain regions of the density 
matrix by considering sub-volumes and then to generate the full density matrix by adding 
up these parts with the appropriate weights. How to calculate the density matrix for 
the sub-volumes is in principle unspecified but is usually done using ordinary electronic 
structure calculations based on exact diagonalization. Let us illustrate the principle for 
synthesizing the density matrix from its subparts in a pictorial way by considering a one- 
dimensional chain molecule. For simplicity let us also assume that we have atom centered 
basis functions. 

First one divides the whole computational volume into sub- volumes which we will call 
central regions. Three such central regions are displayed in Figure (|TU|) by dark shading. 
Around each central region one puts a buffer region denoted by light shading. The sum 
of these two regions corresponds to the localization region of the other 0(N) methods. 

Once one has set up this subdivision, one does an electronic structure calculation 
within each localization region to obtain the density matrix belonging to this region. 
These different calculations are only coupled by the requirement that the Fermi level is 
the same in all the localization regions. Typically a very small temperature is used to 
stabilize the search for the overall Fermi level. From the density matrices obtained in this 
way, one cuts off all the corners, i.e. the regions where both matrix indices belong to basis 
functions in the buffer region. This transformation is shown for one localization region in 
Figure (0). 

Using these cross-shaped blocks one then finally synthesizes the density matrix as 
shown in Figure ([T^) by adding up the different contributions. The regions shown in dark 
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Figure 10: The different sub-volumes described in the text which are necessary for the 
application of the DC method to a chain molecule. The atoms are denoted by black dots. 



■ 



Figure 11: From the full density matrix calculated for a certain localization region (left 
hand side) only the cross shaped part (right hand side) is used 



shading which do not overlap with other regions have thereby weight one, whereas the 
overlapping regions indicted by light shading have each weight one half such that the sum 
of the weights is one as well in the overlap regions. 

The achievement of linear scaling in the DC and OM methods is conceptually related. 
In both methods certain parts of the density matrix are calculated independently. The 
main difference is that in the FOE method no calculated parts of the density matrix 
are discarded in the way done in the DC method as depicted in Figure |ll]. The FOE 
method can thus be considered as some DC method where only the central part of the 
density matrix which is not contaminated by the boundary of the localization region is 
calculated. The fact that in the DC method only a small part of the density matrix 
obtained by costly diagonalizations is used, while the largest part associated with the 
buffer region is thrown away results in a large prefactor (Equation (Q)) for this method. 
This is evidently a particularly serious disadvantage if large localization regions are needed 
as will be discussed in more detail in section ^ 

The calculation of the forces acting on the atoms within the DC method is also de- 
scribed by Yang and Lee (1995). Their force formula is based on the Hellmann-Feynman 
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Figure 12: The full density matrix is obtained by adding up the contributions from 
the different localization regions. In this figure only 3 contributions from the localization 
regions of Figure are shown. 

theorem (Feynman 1939) as well as some other terms such as Pulay forces (Pulay 1977) 
which arise from the use of atom centered basis sets and auxiliary charge densities. As 
has been discussed in the case of the FOE method the Hellman Feynman expression 
for the force ( |56D is not exactly consistent with the total energy expression in a non- 
variational method, since it is based on the assumption that one is allowed to take traces. 
Even though the density matrix in the DC method is not calculated via a polynomial 
expansion, the analysis given for the FOE method also applies to the DC method since 
conceptually one can represent any matrix functional of if as a polynomial Taylor ex- 
pansion. The total energy will consequently not have its minimum exactly at the same 
place where the Hellman Feynman forces vanish if both quantities are calculated with the 
DC method. In the case of the FOE method there is a simple analytic expression for the 
calculation of the total energy, even in the case where localization constraints are imposed 
(Equation (p3D). One can therefore differentiate it without using the simplifications aris- 
ing from the use of traces to obtain consistent forces. No such simple prescription can be 
written down for the DC method which would allow the calculation of consistent forces. 
Evidently this compatibility problem becomes negligible for large localization regions and 
there are certainly practical applications where small inconsistencies of forces and energies 
are tolerable. 

3.4 The Density Matrix Minimization approach 

The DMM approach of Li, Nunes and Vanderbilt (1993) is another approach where the 
full density matrix is constructed. In contrast to the FOE method one obtains the density 
matrix F in the limit of zero temperature, so no adjustable temperature parameter enters 
the calculation. The density matrix is obtained by minimizing the following functional 
for the grand potential Q with respect to F 

n = Tr[{3F^ - 2F^){H - /iJ)] . (78) 



31 



There is no constraint imposed during the minimization so all the matrix elements of F 
are independent degrees of freedom. Nevertheless the final density matrix will obey the 
correct constraint of being a projector if no localization constraints are imposed. This is 
related to the fact that the matrix — 2F^ is a purified version of F as can be seen 
from Figure If F has eigenvalues close to zero or one then the purified matrix will 
have eigenvalues that are even closer to the same values. It is also clear from Figure O 
that the eigenvalues of the purified matrix are contained in the interval [0;1] as long as 
the eigenvalues of F are in the interval [ -1/2 ; 3/2 ]. 




Figure 13: The McWeeny (1960) purification function 3x^ — 2x^ 



The gradient of VL as given by Equation (|78D with respect to F is itself a matrix and 
it is given by 

— = 3(F H' + H' F)- 2(F^ H' + F H' F + H' F^) , (79) 
oF 

where H' = {H — fil) . In order to verify that Equation (^) defines a valid functional we 
have to show two things. First, that the grand potential expression ([75| ) gives the correct 
result if we insert the exact density matrix F, and second, that the gradient (|79D vanishes 
in this case. From Equation (|2^) we see that the exact F is a projection operator, i.e 
that F"^ = F. Therefore (3F^ — 2F^) = F and the grand potential expression ( |7^ ) agrees 
indeed with the correct result ([TI|) . Using in addition the fact that H' and the exact 
F commute (as follows from Equations (pO|), (plf ) ) it is also evident that the gradient 
in Equation (|7^) vanishes. The gradient vanishes however not only for the ground state 
density matrix F but also for any excited state density matrix. In order to exclude the 
possibility of local minima, we have to verify, that these stationary points are no minima. 
This can easily be done (D. Vanderbilt, private communication) using the fact that the 
functional is a cubic polynomial with respect to all its degrees of freedom. Let us suppose 
that there are two minima. Inspecting the functional along the line connecting these two 
minima we would obviously again find these two minima, which is a contradiction because 
a cubic polynomial cannot have two minima. Thus we have proved by contradiction that 
the DMM functional has only one single minimum. 
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There is a second thing which is worrying at first sight with this functionaL If the 
density matrix for an insulator at zero temperature is of the correct form (i.e. if the 
occupation numbers n; are integers) the gradient ([79|) will vanish independently of the 
value of the chemical potential. This ambiguity however disappears as soon as one has 
fractional occupation numbers. Let us consider an approximate density matrix of the 
form 

F = J2ni\^i><^i\. (80) 

I 

Then it is easy to see that 

n = J2(e,-^)(3nf-2nf) (81) 



J26{ei-f^)ni{l-ni)\^i><<iJi\. (82) 



'dF 

The polynomial of Equation ([8l| ) is the same as the one shown in Figure |13| and we 
see that components corresponding to eigenvalues larger than the chemical potential are 
damped until they vanish in the minimization process, whereas components corresponding 
to smaller eigenvalues are amplified until they reach the value one. Thus the chemical 
potential will determine the number of electrons to be found in the system as it should. 
The above statements are actually only correct if all the n/'s are contained in the interval 
[-1/2: 3/2]. If this is not the case then one can see from Figure |1^, that there can be 
runaway solutions, where some n; tend to ±oo. When we implemented the scheme we 
however never encountered in practice such a runaway case. 

Having convinced ourselves, that the functional defined in Equation (|78D is well be- 
haved, let us now estimate the number of iterations which are necessary to minimize it. 
As is well known, the error reduction per iteration step depends on the condition num- 
ber K which is the ratio of the largest curvature amax to the smallest curvature amin at 
the minimum. These curvatures could be determined exactly by calculating the Hessian 
matrix at the minimum. Let us instead only derive an estimate of these curvatures by 
calculating the curvature along some representative directions. To do this let us now 
consider a ground state density matrix where some fraction x of an excited state is mixed 
in 

F{x) = J2 ^:(r)^n(r) - x^}{r)^j{r) + x^}(r)^j(r) . (83) 

71=1 

The index / is a member of the N^i eigenstates below fi and the index J refers to a state 
above /x. The expectation value of the OM functional for this density matrix is given by 

n{x) = Tr[{3F{xf -2F{xf){H - (84) 
= Y.^n+{Sx^-2x')iej-ej) 

n=l 

and its second derivative by 

d^n{x) 



dx"^ 



= 6(ej - e,) . (85) 

x=0 
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The largest curvature will roughly be emax — ^min and the smallest curvature of the order 
of the HOMO-LUMO separation Cgap = e^v j+i — ctv , The condition number is thus given 
by 

^max ^max ^min /o/->\ 
K, = . (86) 

In the conjugate gradient method, which is the most efficient method to minimize the 
DMM functional, the error decreases as follows (Saad) 

(87) 

The error is defined in this context as the length of the vector which is the difference 
between the exact and approximate solution at the k-th iteration step. Under realistic 
conditions k is large and the number of iterations na to achieve a certain accuracy is 
therefore proportional to 

njtOcV«=W ■ (88) 

V ^9ap 

This is an important result since it indicates that in an insulator the number of iterations 
is independent of system size. This result is also confirmed by numerical tests. 

The use of a conjugate gradient scheme requires line minimizations along these conju- 
gate directions. For arbitrary functional forms this has to be done by numerical techniques 
such as bisection (Press et ai, 1986). In the case of the DMM functional we have however 
a cubic form along each direction. The four coefficients determining the cubic form can 
be calculated with four evaluations of the functional. Once these 4 coefficients are known 
the minimum along this direction can easily be found. 

Doing a series of minimization steps as outlined above will in general result in a density 
matrix which does not lead to the correct number of electrons. Thus one has to do some 
outer loops where one searches for the correct value of the chemical potential. For better 
efficiency, these two iterations loops can however be merged into one loop where one 
alternatingly minimizes the energy and adjusts the chemical potential (S-Y. Qui et al., 
1994). 

The forces on the atoms are given by 

dn on dF on dH 

+ ■ (89) 



dRa dF dRa dH dRa 



Since the method is variational, |p vanishes at the solution and the force formula simplifies 



to 



dQ dQ dH ^ 
Tr 



(90) 



dR^ dH dRa 

which can easily be evaluated. 

The introduction of a localization region leads again to some subtleties. Whereas in 
the unconstrained case the eigenvalues of the final density matrix F will all be either zero 
or one, this is not any more the case when a localization region is introduced. So the 
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truncated F is not any more a projection matrix but it is given by 



Mb 



m=l 



(91) 



where now are the eigenfunctions of the truncated F and the occupation numbers 
Hm their eigenvalues. In a certain sense the locahzation constraint introduces a finite 
electronic temperature. This is actually not surprising after the discussion of the relation 
between the temperature and the localization properties in section 0. Figure ^ shows 
the energy expectation values of the eigenvectors of F versus the occupation numbers, for 
the case of a crystalline Si cell of 64 atoms, where the localization region extends up to 
the second nearest neighbors. As one sees, the energy expectation values < 
of the eigenvectors of F are very close to the exact eigenvalues of H. 
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Figure 14: An analysis of the eigenvectors of the full and truncated density matrix. 
In the case of the full density matrix the eigenvectors were chosen to be simultaneously 
eigenvectors of both F and H, and the eigenvalues with respect to F (occupation numbers) 
are plotted versus the eigenvalues with respect to H. In the case of the truncated density 
matrix, the eigenvectors cannot anymore simultaneously diagonalize F and H . Therefore 
the eigenvalues with respect to F are plotted versus their energy expectation values with 
respect to H . Note that in the energy expression ( [7^ the purified density matrix 3F'^ — 2F^ 
enters instead of F. The occupation numbers of the purified version are closer to zero or 



one. 



This close correspondence of the eigenvectors of F to the eigenvectors of H explains 
why the number of iterations needed to find the minimum does not increase as one intro- 
duces localization constraints. Equation (|85D remains approximately valid if the occupa- 
tion numbers for the occupied states are close to 1 and if the occupation numbers for the 
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unoccupied states are very small as well as if the energy expectation values < '^m\H\'^rn > 
are close to to the exact eigenvalues of the Hamiltonian. These conditions are fulfilled 
as discussed above. Hence the condition number for the minimization process does not 
change appreciably in the truncated case. 

All the arguments used to prove the absence of local minima remain valid in the 
truncated well. The force formula Equation ( PD| ) remains equally valid. 

An alternative derivation of this algorithm has been given by Daw (1993). He con- 
siders a differential equation which describes the evolution of a density matrix when the 
electronic temperature is cooled down from infinity to zero. The change of the density 
matrix during this process is equal to the gradient of Equation (|79|) . 

3.5 The Orbital Minimization approach 

The OM method (Mauri et al, 1993, Ordejon et al, 1993, Mauri and GaUi 1994, Ordejon 
et al., 1995, Kim et al., 1995) also calculates the grand potential in the limit of zero 
temperature. In contrast to the two previous methods, it does not calculate the density 
matrix directly but expresses it via the Wannier functions according to Equation (pBf). 
These Wannier functions are obtained by minimizing the following unconstrained func- 
tional. 

^ = 2 E E cr^,c^ -EE <^K^<^ E cfcr , (92) 

n i.j n,m ij I 

where c" is the expansion coefficient of the n-th Wannier orbital with respect to the i-th 
basis function and H^j are the matrix elements of the shifted Hamiltonian H — fil with 
respect to the basis functions. In the original formulation (Mauri et al, 1993, Ordejon et 
al., 1993, Mauri and Galli 1994, Ordejon et al., 1995) only N^i orbitals were included in the 
orbital sums in Equation (|9^ ) (i.e. n = l...Nei , m = l...Nei). In the formulation of Kim 
(1995) more than N^i orbitals are included in the sums. The functional of Equation (p^ ) 
can be derived by considering the ordinary band structure energy expression 

^BS = EEcr^:.c- (93) 

and by incorporating the orthogonality constraint by a Taylor expansion of the inverse of 
the overlap matrix O between the occupied orbitals 

o.,^ = Ecrcr (94) 

up to first order. A family of related functionals can be obtained by Taylor expansions 
to higher order (Mauri and Galli 1994, Galli 1996). Since these functionals do not offer 
any significant advantage and are not used in calculations we will not discuss them. The 
gradient of the functional of Equation (|92D is given by 

^ = 4 E Kfl - 2 E E Kj^T E ^r^r - 2 E E ^Kj^ ■ (95) 

k j m j I m i,j 
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Let us first discuss this functional in tlie case wliere no localization constraint is 
imposed on the orbitals. It is easy to see that the functional ( p2D is invariant under 
unitary transformations of the occupied (i.e. N^i lowest) orbitals. So we can derive 
our results in terms of eigenorbitals rather than Wannier orbitals. The coefficients 
are then the expansion coefficients of the eigenorbitals. Using the fact that in this case 
E« c^c™ = 5n,m and that Y.i,j c^H- jdp = 5n,m(en - /u) we obtain 

n ij n,m ij 

= EEcr^-,s" 

= E ~ f^^<^i 



which is the standard expression ( |TlD for the grand potential. Similarly the gradient 
expression can be simplified obtaining 



dc 



- 4 E Hk.^l - 2 E E Hi,cf6r.,^ - 2cl E - /^) (96) 

j m j m 

= 2E^^,,s"-24(e„-/i) = 0. 

j 

So the functional has indeed a vanishing gradient at the ground state and it gives the 
correct ground state energy. As was the case for the DMM functional the gradient vanishes 
not only for the set of ground state orbitals but also for any set of excited states. So we 
have to verify that these stationary points are not local minima but saddle points. We 
do this by picking a certain direction along which the curvature is negative. In the OM 
case an excited state is described by a set of Nei orbitals where at least one index 
n = I corresponding to an occupied orbital / is replaced by an unoccupied orbital J. Let 
us now consider the variation of the grand potential Q{x) under a transformations of the 
form — > cos(x)\l/j + sin(x)^'/. One can show that the curvatures at these stationary 
points is given by 



= -4(ej - e,) . (97) 

=0 

Since the unoccupied eigenvalue ej is higher in energy than the occupied one e/, the 
curvature is negative and we have indeed a saddle point. In the same way we can also 
again show that the condition number is given by Equation (^). 

Also in analogy to the DMM functional one can show (Kim et al., 1995) that in 
the formulation of Kim, the chemical potential ^ determines the number of electrons by 
amplifying components below /i and annihilating components above it. Considering a 
state consisting of a set of eigenfunctions of H where each eigenfunction is multiplied 
by a scaling factor a^, the expectation value for the grand potential becomes 

n = Y.{2-al)al{en-li). (98) 



The relevant function (2 — x'^)x'^ is shown in Figure (|I5D. One can see that the minimum 
of Equation (|98|) is attained by a„ = if e„ > /i and by a„ = ±1 if e„ < /x. Again 
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this is only true if is within a certain safety interval. Otherwise there can be runaway 
solutions. Infinitesimally close to the solution becomes ill defined in an insulator as it 
should. 




Whereas the DMM functional keeps all its good properties when one introduces a 
localization constraint, the OM functional looses most of them. The localization con- 
straint is introduced in the OM functional by allowing each Wannier orbital to deviate 
from zero only within its own localization region. These localization regions are usually 
atom centered and contain a few shells of neighboring atoms. The basic idea of the OM 
functional, namely of describing an electronic system by a set of Wannier functions with 
finite support is already problematic. Orthogonality and finite support are mutually ex- 
clusive properties and so the orbitals which one obtains in the minimization process are 
necessarily non-orthogonal. The true Wannier functions are however orthogonal. In addi- 
tion, as we have seen in the DMM case, a density matrix which is truncated has full rank, 
i.e. none of its eigenvalues is exactly zero. Thus Nei Wannier orbitals are not sufficient 
to represent the density matrix in this case. The generalized formulation of Kim (1995) 
where more than iVg/ orbitals are used alleviates this problem, but does not completely 
fix it unless the number of orbitals is equal to the number of basis functions M^. 

When implemented with localization constraints the OM functional exhibits the fol- 
lowing problems: 

• The functional has multiple minima (Ordejon et ai, 1995). Depending on the initial 
guess one obtains thus different answers, some of which are physically meaningless 
(Kim 1995). As we have shown above in Equation (p7| ) the functional has no multiple 
minima in the non-truncated case. The analysis we used to show this was based 
on the eigenfunctions. Since the Wannier functions have no resemblance to the 
eigenfunctions this analysis cannot be carried over into the localized regime. In the 
case of the DMM method the absence of multiple minima could be proven using the 
fact that the DMM functional is cubic. The OM functional (|92D is however quartic 
with respect to its degrees of freedom and will thus in general have multiple minima. 
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The problem of the multiple minima is attenuated by the formulation of Kim (1995) 
but it is not completely eliminated since the functional still has quartic character. As 
a byproduct of the multiple minimum problem, the total energy cannot be conserved 
in molecular dynamics simulations, which is an important requirement. Here again, 
energy conservation is better in the Kim formulation but still far from perfect (Kim 
1995). 

In practical applications of the OM method for electronic structure methods great 
care is usually taken to construct input guesses which correspond to the physical 
bonding properties of the molecule under consideration (Itoh et al, 1996). If the 
minima that is closest in distance is always selected during the subsequent line min- 
imizations then one will most likely end up in a physically reasonable minima that 
reflects the bonding properties of the input guess (Stephan private communication). 
This is especially true if the localization regions are large and if the topology of the 
total energy surface within a reasonably large region around the physical minimum 
is not too different from the one of the non-truncated case. Such a procedure is of 
course not applicable in systems where the exact bonding properties are unknown. 

• The number of iterations is very large whenever any localization constraint is im- 
posed together with tight convergence criteria. This is due to the deterioration of 
the condition number, a phenomenon which is easy to understand (Ordejon 1995). 
Introducing a localization region destroys the strict invariance of the band structure 
energy under unitary transformations among the occupied orbitals. When the local- 
ization region is large, this invariance will still approximately exist and one can find 
certain directions around the minimum where the energy varies extremely slowly 
and where the curvature is therefore much smaller than the smallest curvature e^ap 
in the unconstrained case. Whereas directions where the curvature is strictly zero 
do not affect the condition number, these very small curvatures will have a negative 
effect on the condition number (Equation (^)) and the required number of itera- 
tions is consequently much larger in the constrained case than in the unconstrained 
case. Even though the condition number deteriorates with increasing localization 
region, the detrimental effect on the number of iterations will disappear at a cer- 
tain point where the gradient due to these small curvatures becomes smaller than 
the numerical threshold determining the convergence criterion of the minimization 
procedure. 

• The optimal localization regions would be centered on the centers of the Wannier 
functions. Since these centers are not known a priori, atom centered localization re- 
gions are usually chosen. In this case the Wannier functions do not generally exhibit 
the correct symmetry (Ordejon 1995). As a consequence molecular geometries ob- 
tained from this functional can have broken symmetry as well. In a Ceo molecule for 
instance there are only two equivalent sites. When treated with the OM functional 
they are however all slightly different (Kim et al., 1995). 

• As follows from Equation ( |9B[ ) there can be runaway solutions. We have encountered 
this problem in test cases with random numbers as input guess. If one would 
construct a more sophisticated input guess, based on the bonding properties of the 



39 



system, this would however probably not occur. In the DMM method the possibility 
of runaway solutions also exists but is never found in practice even with the most 
trivial input guess. 

• If the method is used in the context of self-consistent calculations, where the elec- 
tronic charge density is used to calculate the Hartree and exchange correlation poten- 
tial, problems arise, since the total charge is not conserved during the minimization 
iteration (Mauri and GaUi 1994). 

To overcome the competing requirements of orthogonality and localization, a related 
approach has recently been proposed by Yang (1997) where the orbitals are allowed to 
be non-orthogonal. This approach of Yang has up to now not been applied in connection 
with a localization constraint. 

3.6 The Optimal Basis Density Matrix Minimization method 

Despite its many advantages in the Tight Binding context, the DMM method has the 
big disadvantage that it is very inefficient if one needs very large basis sets (i.e. many 
basis functions per atom). Large basis sets are typically required in grid based Density 
Functional calculations. In this case it just becomes impossible to calculate and store the 
full density matrix in the DMM method even though it is a sparse matrix. From this point 
of view the Wannier function based methods are advantageous since they do not require 
the full density matrix. The basic idea of the OBDMM method (Hierse and Stechel 1994, 
Hernandez and Gillan 1995) is now to contract first the fundamental basis functions into 
a small number of new basis functions and then to set up the Hamiltonian and overlap 
matrix in this new small basis. A generalized version of the DMM method which can 
be applied to the non-orthogonal context (a subject which will be discussed later in the 
article) is then used to solve the electronic structure problem in this basis. The essential 
point is that one tries to do the contraction in an optimal way. This is done by minimizing 
the total energy also with respect to the degrees of freedom determining the contracted 
basis functions Formulated mathematically the density matrix is given by 

F{ry) = J2^:{r)K,,^,{r'). (99) 

The matrix is a purified version of the the density matrix within the contracted basis 
L and it is given by 

K = 3LOL - 2LOLOL , (100) 

where O is the overlap matrix among the contracted orbitals. The main difference be- 
tween the formulation of Hierse and Stechel and of Hernandez and Gillan is that in the 
first formulation the number of contracted basis functions is equal to the number of 
electrons, whereas in the second approach it can be larger. In the formulation of Her- 
nandez and Gillan the basis set can for instance be chosen to have the size of a minimal 
basis set. The difference to standard minimal basis sets from quantum chemistry is that 
it is optimally adapted to its chemical environment since the contraction coefficients are 
not predetermined but found variationally. In practice the full density matrix is found 
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by a double loop minimization procedure. In the inner loop one has the ordinary DMM 
procedure to find the density matrix for a given contracted basis set. In the outer loop 
one searches for the optimally contracted basis functions for fixed L. 

Unfortunately the minimization of the contracted basis functions \E'j is ill conditioned 
(Gillan et ai, 1998) and the number of iterations is therefore at present very large. As 
already explained before ill-conditioning occurs if the curvatures in the minimum along 
different directions are widely different. Three causes for the ill-conditioning are reported 
by Gillan (1998). 

• Length scale ill-conditioning: 

This problem is actually not related to the OBDMM algorithm itself but to the 
(uncontracted) basis functions which are taken to be so-called " Bhp" functions in the 
present implementation. This kind of problem can be found in all iterative electronic 
structure algorithms if grid based basis functions such as finite elements are used. 
Its origin is easy to understand. Let us imagine that we are searching for the lowest 
state of jellium using a localized basis set associated with an equally spaced grid. 
By symmetry the solution is a constant vector, i.e. all basis functions have the 
same amplitude in the solution vector. Let us now assume that we explore the 
energy surface around the minimum along several directions. Let us first "go" into 
a direction where we add components in such a way that the sign of the amplitude 
of each neighboring basis function changes. This corresponds to a high frequency 
plane wave and since the kinetic energy of such a plane wave is big, the total energy 
will rapidly increase if wc add a such a contribution to our solution vector. If 
on the other hand we add contributions that correspond to low frequency plane 
waves the energy will increase much more slowly. Since in grid based methods the 
basis functions are usually narrow and since one can thus construct high frequency 
functions the condition number can be very bad. As one can suspect from the above 
explanation the different curvatures can be estimated by doing a Fourier analysis. 
With this information one can then use preconditiong techniques to cure the length 
scale ill-conditioning problem. Such a scheme has been proposed by Bowler and 
Gillan (1998). 

• Superposition ill-conditioning: 

This ill-conditioning problem is essentially identical to the ill-conditioning problem 
of the OM functional. If we have Ngi contracted basis functions and no locahzation 
constraints the total energy is invariant with respect to unitary transformations of 
these functions. The introduction of a localization constraint destroys this invari- 
ance but there is an approximate invariance left with manifests itself in very small 
curvatures in the minimum along certain directions. 

• Redundancy ill-conditioning: 

This problem can only be found in the formulation of Hernandez and Gillan, where 
the number of contracted basis functions is larger than the number of electrons. In 
this case one spans a space that contains not only the occupied orbitals but also some 
unoccupied. As was shown before in the context of the DMM functional introducing 
a localization constraint will not assign zero occupation numbers, but only very small 
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occupation numbers to components corresponding to the unoccupied states in the 
unconstrained case. Since these components corresponding to the unoccupied states 
have now very httle weight they have httle influence on the total energy and one 
has again certain directions where the total energy changes very slowly resulting in 
very small curvatures. 

Another open question is whether the OBDMM has local minima. The functional is 
a 6-th order polynomial with respect to the expansion coefficients of the contracted basis 
functions as can be seen from Equation (pPj) and (|1(J(J|). The two overlap matrices in 



Equation ( |10(]| ) give each a quadratic term, the two contracted orbitals in Equation (|99|) a 



linear term. Minimization with respect to the contracted basis functions should therefore 
exhibit local minima. Local minima have however not been reported with this method so 
far. Perhaps the following DMM minimization step which is free of local minima saves 
the method from overall local minima. 



4 Comparison of the basic methods 

It is certainly not possible to claim that a specific method is the best for all applications. 
Nevertheless the methods presented so far differ in many respects and one can therefore 
clearly judge under which limiting circumstances certain methods will fail or perform well. 
In the following the methods presented so far will therefore be compared under several 
important aspects. The comparison will be done in two categories. The first category 
applies to electronic structure methods requiring a small number of degrees of freedom 
per atom. The Tight Binding method belongs to the first category requiring a few basis 
functions per atom (or just a few degrees of freedom in the case of semiempirical Tight 
Binding). But we will also include the standard quantum chemistry methods into this 
first category, where one typically needs from a few up to a few dozen Gaussian type 
basis functions per atom. The second category contains methods which are grid based 
such as finite difference schemes (Chelikowsky et ai, 1994), or where the basis functions 
can be associated with grid points such as in finite element basis functions (White et ai, 
1989) or blip (Hernandez et ai, 1997) basis functions. In these methods one has typically 
many hundred degrees of freedom per atom. Even though the density matrix is a sparse 
matrix, 0(N) methods which calculate the full density matrix can not be applied to the 
second category of electronic structure methods. The memory requirements alone are 
already prohibitive. As pointed out before we can expect that the localization region in 
a 3-dimensional structure comprises on the order of 100 atoms. The density matrix will 
exhibit significant sparseness only for larger system. Assuming that we just have 100 basis 
functions per atom the storage of the density matrix would require about 1 Gigabyte of 
memory which is the upper limit of current workstations. The comparison in the large 
basis set class will therefore comprise only the methods which are Wannier function based 
namely FOP, OM and OBDMM. The comparison in the small basis set class will comprise 
FOE, DC, DMM and OM, excluding two methods which are explicitly targeted at large 
basis sets, namely FOP and OBDMM. 
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4.1 Small basis sets 



The comparison of the methods apphcable to small basis sets is based on the following 
criteria: 

• Scaling with respect to the size of the localization region: 

The size of the localization region is taken as the number of atoms contained within 
it. Only the FOE method has a linear scaling with respect to the size of the local- 
ization region. As one increases the size of the localization region the nonzero part 
of each column of the Chebychev matrices increases linearly implying also a linear 
increase in the basic matrix time vector multiplication part. In the DMM method 
the CPU time increases quadratically since the numerical effort for the basic ma- 
trix times matrix multiplications grows quadratically with respect to the number of 
off-diagonal elements of the matrix. Neglecting ill-conditioning problems, the OM 
method exhibits quadratic scaling, since the numerical effort for the calculation of 



the overlap matrix (Equation (p^)) between the Wannier orbitals increases quadrat- 
ically. As the localization region grows there are more matrix elements and the 
calculation of each matrix element is more expensive since each orbital is described 
by a longer vector. Because the ill-conditioning problem becomes more severe for 
large localization regions the number of iterations increases in reality in a way which 
is difficult to model resulting in an effective scaling that is stronger than linear. The 
DC method has a cubic scaling with respect to the size of the localization region 
if each sub-volume is treated with diagonalization schemes. From the comparison 
of the scaling behavior of all these methods one can thus conclude, that the FOE 
method will clearly perform best if large localization regions are needed. The FOE 
method is thus also the only method which can be faster than traditional cubically 
scaling algorithms if no localization constraints are imposed. In this case its overall 
scaling behavior is quadratic whereas all other methods have a cubic scaling with a 
prefactor which is significantly larger than the one for exact diagonalization. 

Scahng with respect to the accuracy: 

A detailed comparison of the polynomial FOE method and the DMM method under 
this aspect has recently been given by Baer and Head-Gordon (1997) for systems of 
different dimensionality. Their analysis is based on the assumption that the decay 
constant 7 is given by the tight binding limit of Equation (|37D. They conclude, that 
in the one dimensional case the DMM has the best asymptotic behavior, but its 
prefactor is much larger than the one of the FOE method, so that the FOE method 
is more efficient in the relevant accuracy regime. In the two dimensional case they 
have the same asymptotic behavior, but the FOE method has again a much smaller 
prefactor. In the most relevant three dimensional case the FOE method has both 
the best asymptotic behavior and prefactor. These results are plausible after the 
preceeding discussion of the scaling with respect to increasing localization region 
size. When one wants to improve the accuracy the most important factor is the 
enlargement of the localization region. It is also clear that in higher dimensions the 
number of atoms within the localization region grows faster than in lower dimensions 
and that the scaling with respect to the number of atoms will thus become the 
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decisive factor in 3 dimensions. In lower dimensions the number of iterations has 
higher relative importance, favoring the DMM method. A comparison of the FOE 
and DMM method applied to quasi two-dimensional huge tight binding fullerenes 
by Bates and Scuseria (1998) is also in agreement with the above statements. They 
found that the FOE and DMM methods give nearly the same performance with a 
small advantage for the FOE method. 

As discussed before, the scaling of the OM and DC methods is stronger than 
quadratic with respect to the size of the localization region. It is therefore clear 
that the required numerical effort for increased accuracy will grow even faster than 
in the DMM method. 

Scaling with respect to the size of the gap: 

In the FOE method the degree Upi of the Chebychev polynomial increases linearly 
with decreasing gap (Equation (|69D ). At the same time the density matrix decays 



more slowly. It follows from Equation (37) that the linear extension of the localiza- 
tion region grows as e^^^ in the applicable weak binding limit. The volume of the 
localization region and the number of atoms contained in it grow consequently as 
e~^p. Taking into account the number of iterations (Equation (|^)), the total nu- 
merical effort increases thus as e~l. In the DMM method the number of iterations 
also increases with decreasing gap but more slowly namely like e"^^^ as follows from 
Equation (^8)). Taking into account the above discussion of the scaling properties of 
the DMM method with increasing localization region we obtain the overall scaling of 
e~^p^/^, which is higher than the scaling behavior of the FOE method. Obviously the 
scaling behavior of the OM and DC methods is worse. So in contrast to what one 
might first think the FOE methods performs best in this limit. In three-dimensional 
metallic systems, the FOE method is thus to be expected to be the only method 
which will work efficiently at good accuracies. 

Finding a first initial guess: 

No initial guess is required in the FOE and DC methods (except perhaps for the 
potential in a selfconsistent calculation). In the DMM method an extremely simple 
and efficient input guess for the density matrix is just a diagonal matrix that sums 
up to the correct number of electrons. In the OM method this point is somewhat 
problematic. As mentioned above a Wannier function represents typically a bond 
or lone electron pair. So if one can draw the standard Lewis structure of a molecule, 
where bonds are denoted by lines and a lone electron pair by a pair of dots, one knows 
where the Wannier functions should be centered and the Lewis formula can be the 
basis for the initial guess. This procedure can not be used any more if the molecule 
is characterized by two or more Lewis structures that are resonating. Especially if 
the two Lewis structures correspond to an electron transfer over a distance which is 
larger than the range of the localization region, serious problems are to be expected 
with Wannier function based methods. In such a case it might actually not only be 
impossible to find an initial guess, but it might also be impossible at all to describe 
such a molecule by Ng^i localized Wannier functions. 

Number of iterations in electronic structure calculations: 
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In the variational methods (OM, DMM) the number of iterations depends on the 
condition number of the energy expression. As pointed out the OM energy ex- 
pression is ill conditioned under localization constraints and therefore the required 
number of iterations is very large. Even for modest accuracy several hundred it- 
erations are required ( Mauri and Galli 1994, Ordejon et ai, 1995). In the DMM 
method on the other hand the number of iterations is equal to the number of iter- 
ations one needs in ordinary electronic structure calculations (non 0(N)), namely 
20 to 30 (Nunes private communication). The quantity that corresponds to the 
number of iterations in the FOE method is the degree Upi of the polynomial. While 
it is difficult to compare the cost of one Chebychev recursion step with the cost of 
a DMM minimization step, such a comparison can be done in the case of the OM 
method. In each OM line minimization step one has to calculate the minimum of a 
quartic polynomial which requires at least 3 applications of H to the wavef unction. 
One Chebychev recursion step requires one application of H. 

• Number of iterations in molecular dynamics simulations: 

In molecular dynamics simulations as well as structural relaxations steps and self- 
consistent mixing schemes the density matrix or respectively the Wannier functions 
from the previous step are a good input guess for the next step. Good initial input 
guesses are beneficial in all methods except in the polynomial FOE method and 
the DC method. It is difficult to quantify the possible savings of such a reuse. To 
preserve the quality of the solution of the preceeding step as an input guess in a 
Molecular Dynamics simulation, it can be necessary to make the time step smaller 
than the integration scheme would allow. How large the maximum time step can be 
depends of course also on the order and properties of the time integration scheme 
used to propagate the Molecular Dynamics simulation. Similar remarks apply to 
the case of structural relaxations. The decisive factor determining the number of 
iterations per molecular dynamics step is in this context again the condition number 
of the functional. With the DMM methods of the order of 2 to 3 steps are needed 
both for accurate molecular dynamics simulations (Qiu 1994) and for structural re- 
laxations (Nunes private communication). The smallest number of iterations that 
was used in molecular dynamics simulations with the OM method was 10, but at 
the price of a very poor energy conservation. 

• Cross over point for standard Tight Binding systems: 

The FOE method has the lowest reported cross over point for the standard carbon 

test-system in the crystalline diamond structure. For the FOE method it is around 
20 atoms (Gocdccker 1995), and for the DMM it is estimated (Li et ai, 1993) to 
be around 90 atoms. No crossover points were ever given for the OM and DC 
method and presumably they are much higher. All quoted cross over points for 
electronic structure calculations are for an accuracy of roughly 1 percent in the 
cohesive energies, but in the relevant publications not all computational details 
are listed to ensure that these numbers are really comparable in all respects. The 
low crossover point of the FOE method can be understood in terms of its scaling 
behavior with respect to the size of the localization region discussed above. For 
small systems the size of the localization region equals the size of the whole system. 
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The FOE method therefore starts off with a quadratic scahng behavior, whereas all 
other methods start off with a cubic behavior. Consequently the cross over point for 
all other methods can only be for system sizes larger than the localization region, 
whereas the crossover point in the FOE method can already be at smaller system 
sizes if it is implemented efficiently. 

In the context of molecular dynamics simulations the cross over points are different 
because some of the variational methods can benefit from good input guesses. For 
the FOE method the cross over point remains at 20 atoms, for the OM method 
Mauri and GaUi quote 40 atoms, and for the DMM method Qui et al. (1994) 
quote 60 atoms. Again no crossover point is given for the DC method. It has to 
be stressed however that the number quoted by Qui et al., (1994) was for highly 
accurate molecular dynamics runs where the total energy was conserved, while in the 
benchmarks of Mauri and Galli no satisfactory energy conservation was obtained. 

• Influence of the range of a sparse Hamiltonian matrix on the performance: 

In the FOE method the numerical effort increases strictly linearly with respect to 
the number of nonzero elements per column which depends cubically on the range of 
the Hamiltonian matrix. In the case of the DMM method it can be shown (Li et al, 
1993) that one has to calculate intermediate product matrices only up to a range 
which is the sum of the range of the density matrix and the Hamiltonian matrix. As 
long as the range of the Hamiltonian is small compared to the range of the density 
matrix the number of operation increases therefore only very weakly with respect 
to an increasing Hamiltonian range. The DMM method therefore outperforms the 
FOE method under such circumstances (Daniels and Scuseria 1998). Hamiltonian 
matrices of relatively large range are found in the context of Density Functional 
calculations using Gaussian basis sets. For Tight Binding calculations, in contrast, 
the range of the Hamiltonian is usually small. The OM method shows the same 
behavior as the FOE method. The numerical effort increases linearly with respect 
to the number of nonzero elements per column of the Hamiltonian. In the DC 
method the numerical effort is independent of the bandwidth, but it is not expected 
that even in this case the DC method might outperform the FOE or DMM method. 

• Scaling with respect to the size of the basis set: 

Let us now consider the case, where the number of atoms as well as all other rele- 
vant quantities, such as the size of the localization region, are fixed and where we 
only increase the number of basis functions per atom rrib. Both the size and the 
number of off-diagonal elements per column of the density matrix will then increase. 
We will also assume that the Hamiltonian is a sparse matrix with ruh off diagonal 
elements per column. In the DMM method the numerical effort will consequently 
grow cubically with respect to m^, since the number of operations needed for the 
multiplication of two sparse matrices of size n with m off-diagonal elements per 
column is proportional to n m^. The DC method scales cubically as well since it 
is based on diagonalization. The FOE method equally scales cubically, since three 
factors are increasing. The number of columns in the density matrix, the number of 
coefficients in each column and the number of off-diagonal elements of the Hamilto- 
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nian matrix. In addition to the arguments showing the unreahstically large memory 
requirements of these methods when used with large basis sets, we thus also find a 
cubic scaling which prohibits the use of these algorithms in this context. 

In the OM method both the application of the Hamiltonian to the orbitals as well as 
the calculation of the overlap between the orbitals scale quadratically with respect 
to rrif,. 

• Memory requirements: 

The DMM method requires the storage of the whole sparse density matrix. If one 
takes advantage of the fact that the matrix is symmetric storage can actually be 
cut into half. The OM method requires only the storage of the truncated Wannier 
orbitals and so the memory requirements are reduced by about 50 percent in the 
typical Tight Binding context compared to the case where one stores the all the 
nonzero elements of the density matrix without taking advantage of its symmetry. 
If the Kim formulation is used the gain can come down to less. In both the DC 
and FOE method only the subparts respectively the columns of the density matrix 
which are consecutively calculated need to be stored. The storage requirements are 
therefore greatly reduced compared to the DMM and OM methods, namely by a 
factor of roughly N^i- 

• Parallel implementation: 

Parallel computers and clusters of workstations are standard tools in the high perfor- 
mance computing environment. The suitability of an algorithm for parallelization is 
therefore also an important aspect. It is of course always possible to parallelize any 
program, the question is just whether this can be done in a coarse or fine grained 
way, i.e. with a small or large communication to computation ratio. Only a coarsed 
grained parallel program will run efficiently on clusters of workstations with rela- 
tively slow communications as well as on a very large number of processors of a 
massively parallel computer. Both the FOE and the DC algorithms are intrinsically 
parallel algorithms, meaning that the big computational problem is subdivided into 
smaller subproblems which can be solved practically independently. In the case 
of the FOE method (Goedecker and Hoisie 1997) the calculations of the different 
columns of the density matrix are practically independent (Equation (^)). In the 
case of the DC method the calculations of the different patches of the density matrix 
are practically independent as well. Both methods can therefore be implemented 
in a coarse grained way. A parallel program based on the FOE method won the 
1993 Gordon Bell prize in parallel computing for its outstanding performance on a 
cluster of 8 workstations, obtaining half of the peak speed of the whole configura- 
tion (Goedecker and Colombo 1994). Impressive speedups of up to 400 have been 
obtained with the FOE method on a 800 processor parallel machine (Kress et ai, 
1998). Even though it is more difficult to implement the OM method in parallel two 
such implementations have been reported. In the OM method two parallelization 
schemes are conceivable. In the first scheme (Canning et ai, 1996) one associates to 
each processor a certain number of localized orbitals. This data structure is optimal 
for the application of the Hamiltonian to the orbitals, but requires communication 
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for the calculation of the overlap between the orbitals. The second scheme (Itoh 
et al., 1995) associates the coefficients of all the orbitals whose localization region 
has an intersection with a certain region of space to a certain processor. This data 
layout is optimal for the calculation of the overlap matrix, but requires commu- 
nication for the application of the Hamiltonian. The OBDMM method has also 
been parallelized (Goringe et al., 1997b). Since the OBDMM is more complex than 
the other methods that have been implemented on parallel machines three different 
parallelization paradigms are required. 

• Quality of forces: 

In the case of the variational (DMM and OM) methods the force formula is par- 
ticularly simple (Equation |9y) since only the Hellman Feynman term survives. It 
has to be stressed that this formula is however only exact if one has succeeded in 
reducing the gradient with respect to all variational quantities really to zero. If, 
in a simulation, the gradient is not reduced to zero within the required precision 
because too many iterations would be required, errors will creep into the calculated 
forces, making them inconsistent with the total energy. From this point of view the 
situation is easier in the FOE method. Since the FOE method is not an iterative 
method (in the sense that one iterates until a certain accuracy criterion is met), the 
force formula of Voter (Equation (^)) will always give forces consistent with the 
total energy. As has already been discussed no consistent force formula exists for 
the DC method. 

Consistent forces are a prerequisite for the conservation of the total energy in Molec- 
ular Dynamics simulations. Even with consistent forces there are however other 
factors which can cause deviations from perfect total energy conservation in Molec- 
ular Dynamics simulations such as finite time steps and events where atoms enter 
or leave the localization region. 

• Cases where the methods become inefficient: 

Cases where different methods become inefficient have already implicitly been pointed 
out when discussing the previous criteria. Let us finally mention a special case where 
the FOE method is inefficient. As discussed above a small gap implies usually highly 
extended density matrices and the FOE method is highly competitive. There can 
however be exceptions to this rule. If by symmetry restrictions there is practically 
no coupling between two well localized states, which are close together, their energy 
levels can be split by a very small amount. If the Fermi level falls in between these 
two levels a very high degree polynomial is needed to separate these two levels in 
an occupied and an unoccupied one. This scenario can be found in the case of a 
vacancy in the silicon crystal. A Jahn Teller distortion leads to a very small split- 
ting between an occupied and an unoccupied gap level. Using a high electronic 
temperature and a low degree polynomial in the FOE method does not reproduce 
this Jahn Teller distortion. A detailed study of this effect is given by Voter et al, 
(1996) showing that a polynomial of degree 200 is needed instead of the typical 
polynomial of degree 50 needed for bulk silicon in the Tight Binding context. From 
an energetic point of view it is frequently not necessary to track such Jahn Teller 
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distortions, since they lead to a rather small energy lowering. In molecular dynamics 
simulations of metallic systems this suppression of the opening of a gap can even be 
beneficial (Goedecker and Teter 1995) since it leads to a smoother density of states 
around the Fermi level. 

In summary, we see that the performance depends critically on many parameters which 
can change from one application to another. Performance superiority claims based on test 
runs of a particular system have therefore to be taken with caution. 

4.2 Large basis sets 

Whereas the methods which are mainly applicable in the context of small basis sets showed 
important differences under the various comparison criteria, the behavior of the FOP, OM 
and OBDMM methods are quite similar under most of these criteria. The comparison 
of the methods which are applicable to large basis sets will therefore be based only on a 
smaller set of important criteria. 

• Scaling with respect to the size of the basis set: 

As pointed out before the methods compared in this sections all have a reasonable 
scaling with respect to the size of the basis set allowing thus their use in the context 
of very large basis sets. In contrast to the discussion of the same point within the 
small basis set framework, the number of nonzero matrix elements of the Hamilto- 
nian is typically independent of the resolution of the grid, i.e. of the number of basis 
functions. The most important part of the FOP, OM and OBDMM algorithms, the 
application of the Hamiltonian matrix to a wave vector scales therefore linearly. At 
the same time all these algorithms require at some stage the calculation of an over- 
lap matrix among the occupied orbitals. This part scales quadratically as discussed 
before. 

• Finding a first initial guess: 

As discussed in the comparison part dealing with small basis sets, it can be difficult 
to find an initial guess for Wannier function based methods. This problem does not 
exist in the OBDMM method if the number of orbitals is larger than the number of 
electrons. In this case the orbitals are just basis functions and by analogy with the 
usual tight binding or LCAO basis sets it should always be possible to generate a 
physically motivated initial guess for these orbitals. 

• Required number of iterations: 

As mentioned both the OM and OBDMM methods suffer from ill-conditioning prob- 
lems and require therfore a frequently excessive number of iterations for the iterative 
minimization. No such ill-conditioning exists for the FOP method. 

• Cases where the methods become highly inefficient: 

None of the 3 methods have ever been applied to metallic systems, and they are all 
expected to fail in this case. 



49 



5 Other 0(N) methods 



The recursion method is a well established method which also exhibits 0(N) scaling. It 
is principally a method to calculate the density of states -D(e), but once the density of 
states is known, the band structure energy can be calculated by integrating eD{e) up to 
the Fermi level. The recursion method has been described extensively (Haydock 1980, 
Gibson et al, 1993) and we will therefore not review it. Let us just point out that within 
the original formulation of the recursion method only the diagonal elements of the density 
matrix could be calculated. For the calculation of the forces one would however also need 
the off-diagonal elements. So the applicability of the recursion method is significantly 
reduced compared to the 0(N) method described in section ^, which all gave access to 
the forces. There have been several attempts to overcome this limitation (Aoki 1993, 
Horsfield et al., 1996b, Horsfield et al., 1996c, Bowler et al., 1997). In contrast to the 
methods of section ^ these Bond Order Potential (BOP) methods are fairly complex 
and difficult to implement. The basic idea in the BOP method is to calculate the off 
diagonal elements of the density matrix as the derivative of diagonal elements of a density 
matrix defined with respect to a transformed basis. Even though it is now possible to 
calculate forces within the BOP method, they are unfortunately not consistent with the 
total energy. In another version of the recursion method the GDOS method (Horsfield 
et al., 1996b, Horsfield et al, 1996c) the exact forces can be calculated. It is however 
necessary to calculate some generalized moments called interference terms from the 
recursion coefficients. This inversion is ill conditioned and becomes unstable if one tries 
to calculate more than 20 moments (Bowler et al., 1997). With such a low number of 
moments it is however not possible to describe many realistic systems (Kress and Voter 
1995) and the error one reaches when the instability sets in is frequently still much too 
large to be acceptable (Bowler et al, 1997). So recursion based methods seem not to be 
a general purpose tool for electronic structure calculations where accurate energies and 
forces are required. BOP methods can however give insight into basic bonding properties 
of crystalline solids (Pettifor). Since in these methods related to the recursion algorithm 
all the diagonal elements of the density matrix have to be calculated, they are obviously 
not very efficient if a very large number of basis functions per atom is used and they were 
indeed primarily proposed for Tight Binding or other schemes with a small number of 
basis function. The only exception is a version proposed by Baroni et al. (Baroni and 
Giannozzi 1995) who suggest to use delta functions as a basis in a Density Functional type 
calculation. With his basis set the diagonal elements of the density matrix are enough to 
determine the charge density, whereas for more general basis functions the off diagonal 
elements are needed as well (Equation (p!2D). Because the number of delta functions has to 
be very large even in the most favorable case of silicon, the crossover point was estimated 
to be around 1000 silicon atoms. This method is therefore clearly not competitive with 
most other methods where the cross over point is much lower. 

Another approach to improve the scaling properties is based on so called pseudo- 
diagonalization (Stewart et al, 1982). The method is closely related to the well known 
Jacobi method for matrix diagonalization. Whereas in the original Jacobi method rotation 
transformations are applied until all off-diagonal elements vanish, only the entries which 
couple occupied and unoccupied states are annihilated in the pseudo-diagonalization 
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method. One obtains thus not the occupied eigenvectors of the Hamiltonian but an 
arbitrary set of vectors which span the same occupied space. In its original formulation 
(Stewart and Pulay 1982) this method still had cubic scaling however with a smaller pref- 
actor than complete matrix diagonalization. Nearly linear scaling has been reported with 
this method (Stewart 1996) if the Hamiltonian matrix is constructed with respect to a set 
of well localized orbitals. In this way most of the elements in the block coupling occupied 
and unoccupied states are zero at the start of the transformations. The annihilation of 
certain matrix elements during the rotation steps causes only a controlled spread of other 
rows and columns in the matrix. So at the end each column and row extends over a region 
which is comparable to the localization region in other 0(N) methods. 

A method proposed by Galli and Parrinello (1992) can be considered as a predecessor of 
the OM method. The total energy is minimized with respect to a set of localized Wannier 
functions. In contrast to the OM method one has however the old conventional functional 
(Equation (|93|) ) which has to be minimized under the constraint of orthogonality, whereas 
in the OM method the orthogonalization constraint is contained in the modified functional 
(Equation (|92D). In this scheme it is necessary to calculate the inverse of the overlap 
matrix between the Wannier functions. From timing considerations this is not a serious 
drawback since this part is only a small fraction of the total workload even for large 
systems and even if it is done with a scheme which scales cubically. There are, however, 
problems of numerical stability. As pointed out by Pandey et al. (1995) the overlap matrix 
becomes close to singular and the introduction of localization constraints can under these 
circumstances falsify the results. 

Also vaguely related to the OM functional are methods where certain constrains are 
included by a penalty function. Wang and Teter (1992) included the orthogonality con- 
straint in this way, Kohn (1996) the idempotency condition of the zero temperature density 
matrix. 

0(N) implementations of electronic structure methods based on the multiple scattering 
theory have also been reported. In the simplest version (Wang et al., 1995) it is essentially 
a DC method with the difference that within each localization region the calculation is 
done with the multiple scattering method. A further development was to replace the buffer 
region by an effective medium (Abrikosov et al., 1995, Abrikosov et al., 1996). This can 
considerably reduce the prefactor of the calculation, especially in metallic systems where 
large localization regions are needed. For this class of methods no force formulas have 
however been reported, a deficiency restricting their applicability. 

A scheme which possibly leads to a reduced scaling behavior has also be proposed 
by Alavi et al, (1994) It is based on a direct representation under the form of a sparse 
matrix of the Mermin finite temperature functional (Mermin 1965). So it allows for a 
finite temperature as does the FOE method. 

As was already mentioned the polynomial FOE method becomes inefficient in cases 
where one has a large basis set causing the highest eigenvalue to grow very large. This 
would necessitate a Chebychev polynomial of very high degree. A elegant method to 
overcome this bottleneck within a polynomial method has been proposed by Baer and 
Head-Gordon (1998). They write the density matrix at a low temperature T as a telescopic 
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expansion of differences of density matrices at higher temperatures Tq^ . 

Ft = Fxgn + {FTqn-l — FTgn ) + {FTgn~2 — Fpgn-l) + ... + (F^gO — -pTgl ) (101) 

The factor q determining the geometric sequence of temperatures is chosen from consid- 
erations of numerical convenience. As the temperature is lowered the numerical rank of 
each difference term becomes smaller and smaller since the difference of two Fermi dis- 
tributions of similar temperature is vanishing to within numerical precision over most of 
the spectrum. Hence it is necessary to find Chebychev expansions only over successively 
smaller regions of the spectrum and it is also possible to calculate the traces (which in 
turn determine all physically relevant quantities) within spaces of smaller and smaller 
dimension. 

Ordejon et al. (1995) proposed a method based on the OM method to calculate 
phonons from the electronic structure with linear scaling. 

This article concentrated on 0(N) methods which are able to calculate the total energy 
of a system as well as its derived quantities such as the forces acting on the atoms. Let us 
finally still point out that there are also several 0(N) methods which primarily calculate 
the density of states and thus give information about the eigenvalue spectrum of a system. 
These methods (Drabold and Sankey 1993, Wang 1994, Silver and Roeder 1997) are not 
described in this article. In principle, it would also be possible to derive band structure 
and total energies from the spectrum by an integration up to the chemical potential. 
Attempts in this direction have however not been very successful up to now with the 
exception of the smeared density of states Kernel Polynomial method (Voter et al., 1996) 
which is closely related to the FOE method. 

6 Some further aspects of 0(N) methods 

Hierse and Stechel (1994, 1996) examined whether non-orthogonal Wannier like orbitals 
are transferable from one chemical environment to another similar one. If this was the 
case one could reuse precalculated Wannier orbitals as a very good initial starting guess. 
Such a property would thus reduce the prefactor of any method regardless of its scaling 
behavior. Unfortunately, they could find reasonable transferability only under rather 
restrictive conditions. When they added CH2 units to a C„if2n+2 polymer they found 
good transferability of the orbitals associated with this building block within a Density 
Functional scheme (Hierse and Stechel 1994). As soon as they started bending the polymer 
(Hierse and Stechel 1996), the transferability was however lost in the Density Functional 
scheme. Only in a non-self-consistent Harris functional scheme some efficiency gains were 
still possible. 

Hernandez et al. (1997) suggest a solution to the basis set problem in 0(N) methods. 
As was mentioned in section ^ 0(N) techniques are difficult to reconcile with extended 
basis sets such as plane waves. Plane waves have however several important advantages 
and are therefore widely used in conventional (i.e. not 0(N)) electronic structure calcu- 
lations. One of their main advantages is that the accuracy can easily be improved by 
increasing one single parameter, namely the minimal wavelength which corresponds to 
the resolution in real space. Hernandez et al. proposed now a basis set of "blip" functions 
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which combines both advantages. It is locahzed and its resolution can systematically be 
improved. As an alternative to the "bhp" functions one could also use finite difference 
techniques (Chehkowsky et al, 1994) or wavelets (Lippert et al, 1998, Goedecker and 
Ivanov 1998b, Arias 1998, Goedecker 1998b) since they share the same advantages. As 
shown by Goedecker and Ivanov (1998c) wavelets allow for highly compact representation 
of both the density matrix and the Wannier functions, since they are localized both in 
real and Fourier space. 

Much of the work of Ordejon (1996) is also based on a new set of basis functions 
proposed by Sankcy and Niklewski (1989). This basis set consists of atomic orbitals 
which are modified in such a way that they are strictly zero outside a certain spherical 
support region. This then gives rise to a Hamiltonian matrix which is strictly sparse. 
By tabulating these matrix elements it is possible to do Density Functional calculations 
whose computational requirements are in between the requirements of traditional Density 
Functional calculations and Tight Binding calculations. Obviously one has to find a 
compromise between accuracy and speed. If the basis functions extend about a larger 
support region, one has a more accurate basis, but the the numerical effort increases 
because the Hamiltonian is less sparse. 

Horsfield and Bratkovsky (1996d) have incorporated entropy terms in 0(N) methods 
within the FOE algorithm. As soon as one has systems at nonzero temperature, these 
terms should in principle be added, however in most systems their effects are very small at 
room temperature. For computational convenience temperatures much larger than room 
temperature can however be used. Wentzcovitch et al. (1992) and Weinert and Davenport 
(1992) showed that the inclusion of entropy gives simplified force formulas, since only the 
Hellmann-Feynman term survives. The free energy A is given by 

A = Tr[F H -keT S{F)], (102) 

where the entropy 5" is a matrix function of F 

S = - {F ln{F) + {1 - F) ln{l - F)) . (103) 

Writing as a Chebychcv polynomial in H and analyzing everything in terms of the 
eigenfunctions of H they find that one has to do a Chcbychev fit to a distribution very 
similar to the Fermi distribution just with some additional features close to the chemical 
potential. Using a formafism by Gillan (1998) they then extrapolate their results to zero 
temperature obtaining faster convergence to the zero temperature limit in this way. Let 
us stress again, that with the FOE method it is possible to build up density matrices 
corresponding to several temperatures without significant extra cost. A set of generalized 
Fermi distributions which allow an efficient extrapolation to the zero temperature limit 
by eliminating arbitrarily high powers of T has also been proposed by Methfessel and 
Paxton (1989) 

As was mentioned in the introduction the fundamental cubic term in an electronic 
structure calculation based on orbitals comes from the orthogonalization requirement. 
In traditional pseudopotential calculations based on a Fourier space formulation there is 
however a second very large cubic term, arising from the nonlocal part of the pseudopo- 
tential. This second cubic term can be eliminated by using pseudopotentials which are 
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separable in real space (King-Smith et al, 1991, Goedecker et al., 1996, Hartwigsen et 
al, 1998) 



7 Non-orthogonal basis sets 

Up to now we have always implicitly assumed that we are dealing with orthogonal basis 
sets, i.e. that 

j 0*(r)0,(r)rfr = <5,,,-. (104) 
Non-orthogonal basis sets give rise to an overlap matrix S*, 



S,,= y 0*(r)0,(r)dr. (105) 

An orthogonality relation similar to Equation ( |104| ) can be obtained by introducing the 
dual basis functions 0j(r) given by 

U^) = Y.SijH^), (106) 

i 

where is the inverse of the overlap matrix S. It is then easy to verify that 

>*(r)0,(r)dr = 5,,. (107) 



As has been mentioned in section ^ all realistic atom centered localized basis sets are 
non-orthogonal. Within the Tight Binding context, there are also many non-orthogonal 
schemes on the market. There is thus certainly a need to apply 0(N) techniques also for 
these schemes. Most of the basic 0(N) algorithms presented previously have therefore 
been generalized to the non-orthogonal case and we will present these generalizations in 
the following. In the context of a non-orthogonal scheme one has to distinguish care- 
fully between the eigenfunctions \E'„ and the associated eigenvector c" which contain the 
expansion coefficients c" such that \E'„(r) = I]iC"0j(r). The eigenvector c" satisfies the 
generalized eigenvalue equation 

Hc^ = enSc^ . (108) 

In the same way one has to distinguish carefully between the density matrix operator 
F(r, r') and the density matrix Fjj itself. While the expression (^) for the density 
matrix operator remains the same, 

F(r,rO = 5:/(e„)vl/:(r)vI/„(r') = J] /(6„) E Cc-0:(r)0,(r') (109) 

n n ij 

the expression for the number of electrons (Equation (p!3|)) is modified to 

N,i = Tr[F S] . (110) 



The expression for the band structure energy (|T0|) however remains valid. In the definition 
of the density matrix Fij (Equation (D) one has to use now the dual basis functions 
instead of the ordinary. 



F,, = / / 0:(r)F(r, v')<P,{r')dvdr' = cTc^fie^) (111) 
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This replacement can have important consequences for the locahty of the density matrix 
Fjj. If we have a set of locahzed orthogonal basis functions (the only known set of basis 
functions with this property are the Daubechies scaling functions), whose extension is less 
than the oscillation period of the density matrix operator then Equation (|^) ensures that 
will have the same decay properties as F{r, r'). This does not necessarily hold true for 
Equation ( |111| ). Even if the basis functions 0i(r) are well localized this is in general not 



true for their duals 0j(r). If the duals have a very slow decay then this slow decay will be 
inherited by Fij and it might not be possible to use 0(N) algorithms for the calculation 
of Fij. Problems might for instance arise if high quality Gaussian basis sets containing 
diffuse functions are used. Preliminary experience indicates that for small basis sets of 
moderate quality the duals are not so delocalized as to destroy the localization of Fij. 

In the case of the DC method the generalization to the non-orthogonal case is triv- 
ial. Since it is based on diagonalization within each subvolume one has to solve just the 
generalized eigenvalue problem (Equation (|108|) ) instead of an ordinary one. In the Den- 



sity Functional context, the DC method has actually only been used with non-orthogonal 
orbitals. 

The FOE method using a Chebychev representation of the density matrix has been 
generalized by Stephan et a/. (1998). It is easy to see that all the central equations of the 
FOE method remain correct if the Hamiltonian H is replaced by a modified Hamiltonian 
H (that is not any more hermitian) given by 

H = S^^H. (112) 

In particular, it remains true that the density matrix is given within arbitrary precision 
by 

npi 

F^il + T.^,T,{H) (113) 

if a sufficiently large Upi is used. The problem is how to find H efficiently. Even if S is 
a sparse matrix the inverse 5*"^ is not sparse in general and H would be a full matrix 
as well, destroying immediately the linear scaling. However, it turns out that the matrix 
elements of H decay faster than the corresponding matrix elements of H (Stephan 1998, 
Gibson 1993). One can therefore cut off the Tight Binding Hamiltonian H at the same 
distance where one would usually cut off H. In this way all the matrices involved are 
reduced to sparse matrices and H can be constructed by solving the set of linear systems 

SH = H . (114) 

Since both the right hand sides in H and the solution vectors making up H are sparse 
different systems of equations are only coupled by subblocks of 5*. So the big matrix 
inversion problem is decoupled into many small systems of equations and the scaling is 
therefore strictly linear. 

If the FOP method is used in connection with a rational approximation the gen- 
eralization to the non-orthogonal case can be done straightforwardly and without any 
approximation (Goedecker 1995) 
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One has then to solve a systems of hnear equations which is a generahzation of Equa- 
tion 

{H-z,S)F, = I (116) 

V 

A similar approach was adopted by Jayanthis et al. (1998). They formulated their method 
in terms of the Green function. However, in Equation ( p.l6| ) is a Green function for a 
complex energy z^, and the methods are essentially equivalent. 

If the FOE method is used to calculate Wannier functions the required projection 
operator Fp is slightly different from the density matrix and it is given by 

The DMM method has also been generalized to the non-orthogonal case (Nunes and 
Vanderbilt 1994). Introducing a modified density matrix F defined as 

F = S-^FS-^ (119) 

the DMM functional (|78D becomes 



n = Tr[{3FSF ~2FSFSF){H - fil)]. (120) 

This has the advantage that one does not have to invert S if one minimizes directly 
with respect to F. The calculation of the gradient of the DMM functional in the non- 
orthogonal case is a tricky point. The definition of the gradient is not as absolute as one 
might think. It is the direction of steepest descent per unit change of the variables, and one 
must therefore define a norm for the multidimensional space of variables before defining 
the gradient (D. Vanderbilt private communication). Two different gradient expression 
have been proposed by Nunes and Vanderbilt (1994) and by White et al. (1997) which 
correspond to two different choices of metric. The gradient of White et al. (1997) requires 
less minimization steps (Gillan et al., 1998), but each minimization step is more expensive 
since it requires the calculation of the inverse of the overlap matrix. From the point of 
view of overall numerical efficiency it is therefore not clear which gradient expression is 
more efficient. 

Another generalization (Millam and Scuseria 1997, Daniels et al., 1997) of the DMM 
method which is similar in spirit to Stephan's generalization of the FOE method con- 
sists in first performing a transformation to an orthogonal basis set by finding the LU 
decomposition of the overlap matrix 

S = U^U, (121) 

where U is an upper triangular matrix. If in addition U is approximated as a sparse 
matrix with m off-diagonal elements then Equation ( [L21| ) can be solved with a scaling 
proportional to n m? where n is the dimension of the matrices involved. Thus the scaling 
with respect to the size of the system is linear as it should be. 

The OM method can easily be generalized to the non-orthogonal case (Ordejon et 
al, 1996). The orbital overlap Z); cf* in the OM functional (Equation (|9^)) has to be 
generalized to Y.i,k cfSi^kC^. 
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8 The calculation of the selfconsistent potential 



We will now discuss an issue which is relevant only in selfconsistent electronic structure 
calculations, namely the calculation of the potential arising from the electronic charge. 
This potential consists essentially of two parts, the electrostatic or Hartree potential and 
the exchange correlation potential. 

8.1 The electrostatic potential 

The solution of Poisson's equation to find the electrostatic potential arising from a charge 
distribution p is a basic problem found in many branches of physics. Solution techniques 
are described in a wide variety of books and articles. We will therefore only point out 
a few aspects which are important in the special context of 0(N) electronic structure 
calculations. 

If one is dealing with an electronic charge density which has only one length scale 
Poisson's equation can be solved efficiently and with a scaling which is close to linear. 
Charge densities of this type are encountered in the context of pseudopotential calculations 
where one has eliminated the core electrons and where the characteristic length scale is 
the typical extension of an atomic valence electron. One can, for instance, use plane wave 
or multigrid techniques (Briggs) which both have a scaling proportional to n log(n) where 
n is the number of grid points. 

The situation becomes problematic when core electrons are included. In this case 
one could in principle still use the above mentioned techniques with a increased resolu- 
tion. One would still have linear scaling, but the prefactor would be so large to make it 
completely impractical both from timing and memory considerations. Exactly the same 
arguments apply to the representation of the wave functions themselves and for this reason 
ordinary plane waves are not used for all electron calculations. 

A widely used basis set for all electron calculations are Gaussian type basis sets (Boys 
1960). By varying the width of the Gaussians one can describe both core and valence 
electrons. The popularity of Gaussian type basis functions comes from the fact that 
many important operations can be done analytically (S. Obara and Saika, 1986) One 
property which we will use is that the product of two Gaussians is again a Gaussian 
centered in between the two original Gaussian type functions. The matrix elements of the 
electrostatic potential part of the Hamiltonian with respect to a set of Gaussian orbitals 
gi{r), i = 1, Mb are given by 

H.. = I drg,{r) U dr'^l '"'-' f'^f'A . (122) 
The elementary integral 

/dr/rfr- ^-(^^^^j;^^^('|^^^'(^^^ (123) 

can also be calculated analytically (S. Obara and Saika, 1986). A straightforward evalu- 
ation of Equation ( |122| ) would then result in a quartic scaling. There are, however, many 
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well known techniques (Challacombe et al, 1995) to reduce this scaling. The most obvi- 
ous trick starts from the observation that there is a negligible contribution to the charge 
density p if the Gaussians gi and gk are centered far apart. Consequently the charge 
density is not a sum over product Gaussians Gm, {Gm = 9i9j)) but only over Ma such 
Gaussians 

Affc Mb Ma 

P(r') = E E P^i 9k{r') gi{v') ^ ^ c„G„(r') . (124) 

k=l k=l m=l 

The size of the auxiliary basis set Ma is proportional to Mi, with a large prefactor which 
depends on the ratio of the largest to the smallest extension of the Gaussians as well as 
on the accuracy target. In a similar way matrix elements ifjj- become negligible if the 
basis functions gi and gj are centered very far apart. Using these two approximations one 
obtains a quadratic scheme with a very big prefactor. An approximate quadratic scaling 
is also found in numerical tests (Strout and Scuseria 1995). 

Another widely used method consists in fitting the charge density by a set of Ma 
auxiliary Gaussians Gj. Even though we use the same symbols (M^, Gi) as above they 
denote now somewhat different quantities. The allowed number of auxiliary Gaussians 
Ma is now much smaller, namely a few times M^. The auxiliary functions themselves are 
therefore not anymore taken to be all the possible product functions, but determined by 



empirical rules in such a way as to give the best possible fit in Equation ( |124| ). The fitting 
involves the solution of an ill conditioned system of linear equations and has therefore 
cubic scaling, however with a small prefactor. The evaluation of the matrix elements 
using this representation of the charge density has then quadratic scaling if one again 
neglects small elements. 

To obtain an even better scaling behavior requires the introduction of completely new 
concepts. One possibility is to build upon algorithms which solve the classical Coulomb 
problem for point particles. The classical Coulomb problem requires the calculation of the 
electrostatic potential arising from all the particles with charge Zj at all the positions 

By grouping particles into hierarchical groups and by describing their potential far away 
from such groups in an controlled approximate way by multipoles these fast algorithms 
allow to evaluate Equation (|125|) with linear scaling instead of the expected quadratic one. 
There are now several proposals (Strain et al., 1996, White et al., 1994, Perez- Jorda 1997), 
how one can modify one of these fast algorithms, the Fast Multipole Method (Greengard 
1994) in such a way that it can handle also the continuous charge distributions arising in 
the context of electronic structure calculations. The basic idea is fairly straightforward. As 
we saw the charge distributions is always given as a weighted sum of auxiliary Gaussians 
(Equation ( |124|) ). Now the electrostatic potential of such a Gaussian particle looks the 
same from a distance as the potential of a point particle. Concerning the far field, one 
can thus essentially take over the existing algorithms. To account for the non point like 
nature of the Gaussian particles one has however to correct the near field. Since the 
calculation of these local corrections have linear scaling the whole procedure has linear 
scaling as well. There are two problems with this kind of approach. First, one has only 
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linear scaling with respect to the size of the basis set if the volume covered by the basis set 
grows at least as fast as the size of the basis set. If one adds for example basis functions 
for a molecule of fixed size, to improve the accuracy of the basis set one does not any 
more have linear scaling. This is related to the fact that one can apply the fast far field 
treatment now to a smaller number of Gaussian particle interactions. This behavior can 
be easily understood by considering the extreme case where all Gaussian particles are 
centered very close together within a radius which is smaller than their width. In this 
case one evidently cannot use any more any far field techniques. The second problem 
is closely akin to the first. If one adds extended Gaussians to the system the efficiency 
deteriorates. 

A method which scales strictly linear with respect to the size of the basis set, inde- 
pendently of whether the volume is increased at the same time or not and which can be 
applied within the context of any basis set, is based on wavelets (Goedecker and Ivanov 
1998a). As the input to this method the charge density is needed on a real space grid which 
can have varying resolution. THus near the core region of the atoms in a molecule the 
resolution can be arbitrarily increased. Using interpolating wavelets this charge density 
can uniquely be mapped to a wavelet expansion, since a wavelet expansion can compactly 
describe nonuniform functions. In the wavelet basis one can then iteratively solve Poissons 
equation 

VV = -47rp. (126) 

The matrix representing the Laplace operator is sparse and the matrix times vector 
multiplications needed for the iterative solution of Equation (|126|) scale linearly. Using a 
preconditioning scheme in a basis of lifted wavelets the condition number is independent 
of the size and of the maximal resolution of the wavelet expansion and the number of 
iterations is therefore constant. Thus one obtains an overall linear scaling. 



8.2 The exchange correlation potential 

Within the most popular versions of Density Functional theory the exchange correlation 
potential is a purely local function. In the case of the Local Density Approximation 
(Parr and Yang) the exchange correlation potential at a certain point depends only on 
the density at that point, in the case of Generalized Gradient Approximations (Perdew et 
ai, 1996, Becke 1988, Lee et al, 1988) it depends in addition still on the gradient of the 
density at that point. If one uses real space methods such as finite differences or finite 
elements as well as plane wave methods where the calculation of the exchange correlation 
potential is done on a real space grid as well, it is obvious that the numerical effort is 
linear with respect to the system size. If one uses more extended basis functions such as 
Gaussian type orbitals it becomes more difficult to achieve linear scaling (Stratmann et 
al, 1996). 

In the case of Hartree Fock calculations the exchange energy 

(127) 

ij \^ ^ I 

seems to be as nonlocal as the Coulomb potential. Using Equation ( ^31) one can however 
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rewrite the expression (|127]) to obtain 



|r — r I 

showing that the exchange energy in an insulator is indeed a local quantity whose lo- 
cality is determined by the decay properties of the density matrix. A linear method to 
evaluate exchange terms within Gaussian type electronic structure calculations based on 
the aforementioned locality properties has been devised by Schwegler and Challacombe 
(1996). An alternative method based on the Fast Multipole Method has been developed 
by Burant et ai, (1996). 



9 Obtaining self-consistency 

To do a self-consistent electronic structure calculation, two ingredients have to be blended. 
The first is the calculation of the density matrix in a fixed external potential, a topic which 
is the main focus of this article. The second is the calculation of the potential from a 



given electronic charge density which was discussed in the preceeding section (|8.1| ). Even 
if both of these basic parts exhibit linear scaling, it is however not yet granted, that one 
has overall linear scaling. It might happen that the number of times one has to repeat 
these two basic parts increases with the size of the system. 

The easiest scheme to combine the calculation of the density matrix and the calculation 
of the potential is the so called scalar mixing scheme. Given an input charge density pin 
which determines the potential one obtains after the calculation of the density matrix for 
this potential via Equation (|12D a new output density pout- The new input density p"^"" is 
now not the output density pout, but rather a linear combination of the old input density 
and the output density 

prr(r) = P.n(r) + aipUr) - P.n(r)) . (129) 

Here a is the mixing parameter. Overall linear scaling is endangered if one has to decrease 
a for reasons of numerical stability as the system becomes bigger and if one consequently 
needs a larger number of iterations. 

The standard theory of mixing (Dederichs and Zeller 1983, Ho et al., 1982) is based 
on the dielectric response function in k space. Within this theory numerical instabilities 
arise if the the dielectric response functions tends to infinity as k tends to zero. This 
happens in a metal but not in an insulator where the dielectric response function always 
remains finite. Following the general philosophy of this paper to remain within a real 
space formalism, we will elucidate mixing under this perspective. The final conclusions 
are of course the same as the one based on the Fourier space theory. 

Let us first consider a metal. We assume that we are doing a calculation for a one- 
dimensional metallic structure of length L. Let us also assume that due to deviations 
from the converged self-consistent charge density we transfer an incremental charge AQm 
from one end of the sample to the other. Using the basic formula for the potential in a 
capacitor we get a constant electric field in the sample giving rise to a potential difference 
of Af/ = L AQin between the two ends. In a metal this potential difference will most 
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likely be larger than the HOMO LUMO separation (which vanishes for large systems) 
and we get a large charge transfer AQout- This charge transfer is related to the density of 
states at the Fermi level, D{ii), which in our one dimensional case is the number of states 
per length unit and per energy unit. So the total charge transfer AQg^t is given by 

AQ,^t^D{fx)LAU = D{fx)L^AQi^. (130) 

If this induced charge transfer AQout is larger than the initial transfer AQj„ then the 
charge transfer will exponentially increase in subsequent iterations and we have the nu- 
merical instability called "charge sloshing". To avoid it the mixing factor a has to be 
proportional to -^f^^^- Doing the same analysis in a three-dimensional structure all the 
lateral dimensions cancel and we get the same result concerning a. Denoting the volume 
of our sample by v we find that a is proportional to So a has to be decreased with 

increasing volume and the number of iterations in the mixing schemes increases with in- 
creasing system size. Fortunately and contrary to the implications of Annett (1995), this 
charge sloshing can be eliminated by state-of-the-art techniques (Kresse, 1996). One pos- 
sibility (Kerker 1981) is just to do the mixing in Fourier space and to have a k dependent 
mixing parameter a{k) — ctop^q^ 

prr(k) = Pin(k) + «o^^^(Po«t(k) - p,„(k)) . (131) 

As we see long wavelength components (corresponding to small k values) are now strongly 
damped by 

ao^2^ao(y)' (132) 

and the dampening has the correct dimensional behavior with respect to the wavelength 
A which corresponds to the length L in our above dimensional analysis. Short wavelength 
contributions are just damped by and this constant dampening sets in when k becomes 
comparable in magnitude to k^. We know, that for wavelengths of the order of the 
interatomic spacing a mixing parameter somewhat smaller than 1 works well and so we 
can determine by these conditions the values of ckq and k^. 

Let us next examine whether we can have charge sloshing in an insulator. We will 
assume that the potential difference across the sample is not larger than the gap, in 
which case the discussion for the metallic case would rather apply. Again we consider 
a sample of length L. According to the modern theory of polarization in solids (King- 
Smith and Vanderbilt 1989) a polarization arises because the centers of the Wannier 
functions are displaced under the action of an electric field. Since the Wannier functions 
are exponentially localized, the charge which will build up at the two surfaces of our 
sample is mainly due to the displacements of the Wannier function in the elementary cells 
of the crystal right on the surface and the charge AQout is thus practically independent 
of the length of the sample. So the optimal mixing constant a is nearly independent of 
the size of the system and the number of iterations as well. 

In conclusion, we see that linear scaling can also be obtained in the selfconsistent case 
and that even in a metal charge sloshing problems can be overcome. 
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Mixing is the natural choice if the DC or the FOE methods are used in a self-consistent 
calculation. If methods based on minimization (DMM and OM) are used one can alterna- 
tively also obtain the ground state by a single minimization loop (R. Car and M. Parrinello 
1985, I. Stich et al, 1989, M. P. Teter et al, 1989, M. Payne, et al, 1992 ) without distin- 
guishing between density matrix optimization cycles and potential mixing cycles. As is 
not surprising after our discussion of mixing one finds (Annett 1995) that in an insulator 
the number of iterations does not depend upon whether one has a self-consistent type of 
calculation where the potential is varying during each minimization step or whether one 
has a fixed potential. In other words there is no charge sloshing. In metallic systems it 
is essential to have finite electronic temperature (Wentzcovitch et al., 1992, Weinert and 
Davenport 1992, Kresse 1996) and therefore the minimization schemes cannot be applied 
straightforwardly in any case. Insofar Annett's analysis (Annett 1995) showing that in 
this case the scaling is at least proportional to N^l'^ is irrelevant. 

A completely different approach to the mixing problem has recently been proposed 
by Gonze (1996). He calculates the gradient of the total energy with respect to the 
potential. His gradient expression does not depend on the wavefunctions and could thus 
well be combined with 0(N) schemes. 

10 Applications of 0(N) methods 

This chapter is not intended to be a comprehensive or even complete review of all the 
applications which were done using 0(N) methods. It is rather intended as an illustration 
of the wide range of areas where 0(N) methods allowed to study systems which were too 
big to be studied with conventional methods. In general one can say that most large-scale 
Tight Binding studies are nowadays done in the connection with 0(N) methods. In those 
cases systems comprising from a few hundred up to many thousand atoms are typically 
studied. Treating such a large number of atoms with 0(N) Density Functional methods 
is much more difficult. In the case of Density Functional calculations the benchmarking 
and verification aspect is usually dominating whereas in tight binding calculations the 
focus was in most cases on how to solve challenging physical problems. 

Questions concerning extended defects in crystalline materials were one of the main 
focus of these Tight Binding studies. Because several good Tight Binding parameters are 
available for silicon, most studies were done for this material. The 90° partial dislocation 
in silicon was at the focus of interest of a series of tight binding studies. The three 
structures that were examined are shown in Figure |TB|. 

The energy difference between the structure (a) and (b) of Figure [1^ was studied both 
by Nunes et al. (1996) and by Hansen et al. (1995). Even though they used different TB 
parameters and different 0(N) algorithms (DMM and FOE) the both obtained exactly 
the energy difference of .18eV/A in favor of structure (b). Later Benetto and al. (1997) 
discovered a new structure (c) that is even lower in energy. To validate their tight bind- 
ing results they did conventional density functional calculations for smaller subsystems 
finding perfect agreement with the 0(N) tight binding results. This new structure is ex- 
perimentally difficult to distinguish form structure (b) and so this result is a convincing 
illustration of the power of these new 0(N) algorithms in materials science. All these 
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(a) 




Figure 16: (a) Symmetric reconstruction of the 90° partial dislocation in silicon. Shaded 
area indicates stacking fault, (b) the single-period symmetry breaking reconstruction, (c) 
The double-period symmetry breaking reconstruction. The figure is reproduced with the 
kind permission of the authors from Nunes et al. (1998). 



tight binding calculations necessitated electronic structure calculations involving a few 
hundred atoms and would therefore have been unfeasible with standard algorithms. 

Extended {311} defects in silicon systems containing more than 1000 atoms and their 
relation to point defects were studied by Kim et al. (1997) using the OM method in the 
improved version of Kim et al. (1995). An understanding of these processes is important 
for the fabrication of semiconductor devices, since defects have a strong influence of the 
diffusion properties of semiconductors. Unfortunately, the more realistic questions involv- 
ing boron dopant atoms in addition to the bulk silicon atoms can not be treated with 
current tight binding models. 

Ismail-Beigi and Arias (1998) examined the surface reconstruction properties of silicon 
nanobars, finding that the influence of edges in these small structures is strong enough to 
lead to surface reconstructions that are different form the ones in bulk silicon. They also 
both employed traditional density functional calculations and 0(N) FOE tight binding 
calculations and also found good agreement between both for small subsystem which are 
accessible to both approaches. 

Roberts and Clancy (1998) simulated vacancy and interstitial diffusion processes in 
silicon using the FOE tight binding method. The diffusion constants they obtain are 
in good agreement with similar calculations based on classical force fields and density 
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functional calculations. Compared to the density functional calculations they could also 
significantly enlarge both the number of atoms (216) and the simulation times. The 
diffusion constant predicted by all these simulations is however orders of magnitude larger 
than the experimental one, a fact for which no explanation is known up to now. 

Besides silicon there is another material for which several good Tight Binding schemes 
are available, namely carbon. FuUerene systems are therefore another focus of Tight 
Binding studies. 

Galli and Mauri (1994) did molecular crash test of Cqq fuUerenes colliding with a di- 
amond surface using the OM method. They found three different impact energy regimes 
where the impinging fullerenes either survive the collision undamaged, slightly damaged 
or get completely destroyed. Even though the interaction region between the impinging 
fuUerene and the surface does not comprise a very large number of atoms, their com- 
putational box contained more than 1000 atoms. The reason why the box has to be so 
large is that the phonons emitted during the collision may not be reflected back from the 
walls of the box during the time scale of the collision. This reflection of phonons is also 
a serious problem in classical force field simulations of crack propagation where for this 
reason systems comprising several million atoms are sometimes necessary (Zhou et ai, 
1997). In the case of this molecular crash test most of the carbon atoms are propagating 
the phonons. Phonons are well described by classical force fields and one could use this 
scheme for the majority of the atoms, while it would be necessary to use the more expen- 
sive tight binding scheme only for the atoms in the collision region. Unfortunately such 
schemes combining molecular methods of different speed and accuracy have not yet been 
developed and thus it is therfore a feature of many 0(N) calculations that one is doing 
an overkill in a certain sense, treating a large number of essentially inactive atoms with 
highly accurate methods. Canning et al. (1997) examined thin films of C28 fullerenes 
with the same method, finding that thin superatom films can be formed. 

The equilibrium geometries of large fullerenes such as C240 was also studied by several 
groups with 0(N) techniques. The central question here is whether such large fullerenes 
have a spherical form or a polyhedrally faceted shape, where nearly fiat polyhedral regions 
are alternating with edges where the curvature is concentrated. York et al. (1994) used 
the original formulation of the DC method in terms of densities to do density functional 
calculations of C240 and found spherical shapes. Itoh and al. using both empirical and ab 
initio tight binding in the context of the OM method found however polyhedral shapes. 
This result is also supported by Xu and Scuseria (1996) who investigated fullerenes up 
to 6*8640 using the DM method. The optimized geometries they found for various large 
fullerenes are shown in Figure p . 

Ajayan et al. (1998) used tight binding FOE molecular dynamics to simulate irradi- 
ation mediated knock-out of carbon atoms out of carbon nanotubes. In agreement with 
experimental observations they found that this atom removal leads to a shrinking of the 
diameter of the nanotube, but leaves the nanotube essentially intact until the diameter 
is practically zero. They could identify in their virtual 400 atom sample processes on 
the atomic level that are responsible for the rapid healing of the defects created by the 
removal of atoms. 

Recently developed Tight Binding parameters (Horsfield 1996a) made it possible to 
study also a composite system, namely hydrocarbons. This set of tight binding parameters 
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Figure 17: Tight binding equilibrium geometries of selected icosahedral fullerenes viewed 
along the C2 symmetry axis. This figure is reproduced with the kind permission of the 
authors from Xu and Scuseria (1996). 



includes some kind of self-consistency by imposing a local charge neutrality requirement. 
Local charge neutrality is essential if different phases are studied because it prevents any 
unphysically large charge transfer between different phases arising from different chemical 
potentials in different phases. Using this new tight binding scheme, Kress et al. (1998) 
studied the dissociation of CH^ under high pressure and at high pressure using the FOE 
molecular dynamics. Previous Density functional based molecular dynamics studies by 
Ancilotto et al. (1997) were limited to very small system sizes of 16 CH^ molecules 
and short simulation times of 2 ps. At variance with experimental findings these density 
functional simulations could not find a phase separation of methane into hydrogen and 
carbon. FOE molecular dynamics allowed to treat much larger systems of 128 molecules 
and also much longer simulation times of 8 ps. After 4 ps a phase separation was indeed 
observed. 

Sanchez-Portal et al. (1997) compared the experimental X-ray structure of a large 
DNA molecule comprising 650 atoms with the geometric structure obtained from a den- 
sity functional based OM relaxation. They obtained a root mean square deviation from 
the experimental geometry of 0.23 A. Their method relies however on fairly drastic ap- 
proximations resulting in errors that are by far larger than the error one generally expects 
from a density functional calculation. 

A similar study of a large biomolecule is reported by Lewis et al. (1997). 
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York et al. used the DC method in the context of the semi-empirical AMI method 
(Dewar et al, 1985) to calculate heats of formation, solvation free energies and densities 
of states for protein and DNA systems containing up to 2700 atoms. 

Daniels and Scuseria reported AMI semi- empirical benchmark calculations for up to 
20000 atoms using DMM, FOE and pseudo-diagonalization methods. 

Applications of 0(N) methods within Density Functional theory, that use basis sets 
large enough such the basis set errors are not dominating the Density Functional error 
at present do not exist. If calculations of this type were done, such as the calculations 
of a cell containing 6000 silicon atoms by Goringe et al. (1997), then the performance 
evaluation aspect was always the dominating one. With the advance of faster computers 
and improved algorithms this situation will, however, certainly soon change. It is also 
interesting to note in this context that the 1998 version of the very popular Gaussian 
software package will contain 0(N) algorithms. 

Let us finally come back to a point briefly mentioned in the introduction. The de- 
velopment of 0(N) methods has also deepened our understanding of locality in quantum 
mechanical systems and has thereby also fostered the development of theories based on a 
local picture rather than the conventional nonlocal Bloch function picture. An example is 
the theory of polarization in crystalline materials by King-Smith and Vanderbilt (1989). 
Their theory is based on a local picture in terms of Wannier functions and allows for an 
intuitive understanding of these phenomena that were difficult to understand before. 

11 Conclusions 

0(N) methods have become an essential part of most large scale atomistic simulations 
based either on Tight Binding or semiempirical methods. The physical foundations of 
0(N) methods are well understood. They are related to the decay properties of the 
density matrix. The use of 0(N) methods within Density Functional methods is still 
waiting for their widespread appearance. All the algorithms that would allow us to treat 
very large basis sets within density functional theory have certain shortcomings. The 
OM and OBDMM method suffer from ill-conditioning problems, and in both the OM 
and FOP method detailed knowledge about the bonding properties is required to form 
the input guess. Thus there is probably still some algorithmic progress necessary before 
these obstacles can be overcome. It is also not quite clear what the localization properties 
of very large complicated molecules are and whether perhaps a quadratic scaling rather 
than a linear scaling is the optimum one can obtain in certain cases. It is clear that the 
elimination of the cubic scaling bottleneck is a very important achievement and that it 
will pave the way for calculations of unprecedented size in the future. Such calculations 
will not only be beneficial to physics, but they will also nourish progress in many related 
fields such as chemistry, materials science and biology. Even with 0(N) algorithms it will 
not be possible in the foreseeable future to treat systems containing millions of atoms at 
a highly accurate Density functional level using large basis sets, as would be necessary 
for certain materials science applications. Such problems can only be approached if one 
succeeds in combining methods of different accuracy such as density functional methods 
with classical force fields, applying the high accuracy method only to regions where the 
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low accuracy method is expected to fail. Hybrid methods of this type will certainly be 
based on the same notions of locahty and similar techniques as 0(N) methods. 
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13 Appendix: Decay properties of Fourier transforms 



The density matrix is defined in terms of a Fourier transform given by Equation (pG]). The 
decay properties of the density matrix are thereby closely related to the decay properties 
of Fourier transforms. All the properties described in this paragraph are well known, for 
completeness we will briefly outline them. 

For simplicity let us just consider a one- dimensional Fourier transform 



oo 

ikr 



g{r)= e''^g{k)dk, (133) 

J — oo 

where g{k) is an integrable piecewise continuous function tending rapidly to for k tending 
to ±oo. 

For any function g{k) of this type, g{r) will obviously tend to zero when r tends to 
infinity. In this case e^^^ is a very rapidly oscillating function and the product e^^^g{k) 
will therefore change sign very rapidly and thus the integral will tend to zero. The exact 
decay properties depend on how many derivatives are continuous. Let us consider first a 
function which is piecewise constant and has only a finite number of discontinuities. A 
function which falls into this class is the function g{k) which is 1 in the interval [-1:1] and 
zero everywhere else. Calculating the Fourier transform one finds g{r) = 2^^^^. Since 
any piecewise function g{k) can be written as a linear combination of the above prototype 
function, its transform g{r) will always decay like 1/r. Using integration by parts we see 
that ^ ^ 

r^g{r) = J^jik) dk = {-i)-' j^j'^ g{k)dk . (134) 

If the l-th derivative is integrable then the integral will vanish for the reasons discussed 
above. So if we can do / integrations by parts each transformation will accelerate the decay 
by one inverse power of r and we can do such a transformation whenever our function 
has at least continuous first derivatives. Hence we arrive at the rule that if / derivatives 
of g{k) are continuous, g{r) will decay like r^^'-+'^\ 

If we have a function g{k) which is analytic, i.e. one for which an infinite number of 
derivatives exists, then the transform will decay faster than any power of r. One then says 
that it decays exponentially instead of algebraically. This notion of exponential decay does 
not necessarily mean that it decays strictly like an exponential function. As an example 
we could for instance take g{k) = exp(— /c^), where we know that the transform is again a 
Gaussian and decays thus faster than an ordinary exponential function. The rate of decay 
will be related to the smallest length scale of g{k). If the smallest length scale of g{k) 
is kmin then g{r) will roughly decay like exp(— c|r|A;mj„), where c is a constant of order 
of 1. This follows from the fact that one will have an important cancellation of terms of 
opposite sign in the integral in Equation (|133|) only if several oscillations occur within the 
interval kmm- 

Another qualitative feature of the Fourier transform is that it will have oscillations 
whenever g{k) is shifted off center. The oscillation period is determined by this shift. As an 
example let us look at the Fourier transform of a shifted Gaussian g{k) = exp(— |(A; — a)^). 
The result is g{r) = \/2Tr exp(zar) exp(— |r^), which is the transform of the unshifted 
Gaussian times an oscillatory term. 
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